A Short Course of Time-Series Analysis and Forecasting by D S G Pollock

A SHORT COURSE OF
TIME-SERIES ANALYSIS AND FORECASTING

At The Institute of Advanced Studies, Vienna
from March 22nd to April 2, 1993
Lecturer : D.S.G. Pollock
Queen Mary and Westeld College,
The University of London
This course is concerned with the methods of time-series modelling which are
applicable in econometrics and throughout a wide range of disciplines in the
physical and social sciences. The course is for nonspecialists who may be inter-
ested in pursuing this topic as an adjunct to their other studies and who might
envisage employing the techniques of time-series analysis in empirical enquiries
within the context of their own disciplines.
The course is mathematically self-contained in the sense that the requisite
results are presented either in the lectures themselves or in the accompanying
text. The techniques of the frequency domain and the time domain are given
an equal emphasis in this course.
Week 1
1 Trends in Time Series
2 Cycles in Time Series
3 Models and Methods of Time-Series Analysis
4 Time-Series Analysis in the Frequency Domain
5 Linear Stochastic Models
Week 2
6 State-Space Analysis and Structural Time-Series Models
7 Forecasting with ARIMA Models
8 Identication and Estimation of ARIMA Models
9 Identication and Estimation in the Frequency Domain
10 Seasonality and Linear Filtering
In addition, there will be a public Lecture on the topic of The Methods of
Time-Series Analysis which is to take place on ***** in ***** at *****. This
lecture will give a broad overview of the mathematical themes of time-series
analysis and of the historical development of the subject; and it is intended for
an audience with no signicant knowledge of the subject.

LECTURES IN TIME-SERIES ANALYSIS
AND FORECASTING
by
D.S.G. Pollock
These two booklets contain some of the material of the courses
titled Methods of Time-Series Analysis and Economic Forecasting
which have been taught in the Department of Economics of Queen
Mary College in recent years. The material is presented in the form of
a series of ten lectures for a course given at the Institute for Advanced
Studies in Vienna titled A Short Course in Time-Series Analysis.
Book 1
1 Trends in Economic Time Series
2 Seasons and Cycles in Time Series
3 Models and Methods of Time-Series Analysis
4 Time-Series Analysis in the Frequency Domain
5 Linear Stochastic Models
Book 2
6 State-Space Analysis and Structural Time-Series Models
7 Forecasting with ARIMA Models
8 Identication and Estimation of ARIMA Models
9 Identication and Estimation in the Frequency Domain
10 Seasonality and Linear Filtering

THE METHODS OF TIME-SERIES ANALYSIS
by
D.S.G. Pollock
This paper describes some of the principal themes of time-series analysis
and it gives an historical account of their development.
There are two distinct yet broadly equivalent modes of time-series anal-
ysis which may be pursued. On the one hand there are the time-domain
methods which have their origin in the classical theory of correlation; and
they lead inevitably towards the construction of structural or parametric
models of the autoregressive moving-average type. On the other hand are
the frequency-domain methods of spectral analysis which are based on an
extension of the methods of Fourier analysis.
The paper describes the developments which led to the synthesis of
the two branches of time-series analysis and it indicates how this synthesis
was achieved.
It remains true that the majority of time-series analysts operate prin-
cipally in one or other of the two domains. Such specialisation is often
inuenced by the academic discipline to which the analyst adheres. How-
ever, it is clear that there are many advantages to be derived from pursuing
the two modes of analysis concurrently.
Address for correspondence:
D.S.G. Pollock
Department of Economics
Queen Mary College
University of London
Mile End Road
London E1 4 NS
Tel : +44-71-975-5096
Fax : +44-71-975-5500

LECTURE 1
Trends in Economic
Time Series
In many time series, broad movements can be discerned which evolve more
gradually than the other motions which are evident. These gradual changes are
described as trends and cycles. The changes which are of a transitory nature
are described as uctuations.
In some cases, the trend should be regarded as nothing more than the
accumulated eect of the uctuations. In other cases, we feel that the trends
and the uctuations represent dierent sorts of inuences, and we are inclined
to decompose the time series into the corresponding components.
In economics, it is traditional to decompose time series into a variety of
components, some or all of which may be present in a particular instance. If
{Y
t
} is the sequence of values of an economic index, then its generic element is
liable to be expressed as
(1.1) Y
t
= T
t
+C
t
+S
t
+
t
,
where
T
t
is the global trend,
C
t
is a secular cycle,
S
t
is the seasonal variation and
t
is an irregular component.
Many of the more prominent macroeconomic indicators are amenable to
a decomposition of the sort depicted above. One can imagine, for example, a
quarterly index of Gross National Product which appears to be following an
exponential growth trend {T
t
}.
The growth trend might be obscured, to some extent, by a superimposed
cycle {C
t
} with a period of roughly four and a half years, which happens to
correspond, more or less, to the average lifetime of the legislative assembly.
The reasons for this curious coincidence need not concern us here.
The ghost of an annual cycle {S
t
} might also be apparent in the index;
and this could well be a reection of the fact that some economic activities,
1

D.S.G. POLLOCK : TIME SERIES AND FORECASTING
such as building construction, are signicantly aected by the weather and by
the duration of sunlight.
When the foregoing componentsthe trend, the secular cycle and the sea-
sonal cyclehave been extracted from the index, the residue should correspond
to an irregular component {
t
} for which no unique explanation can be oered.
This component ought to resemble a time series generated by a so-called sta-
tionary stochastic process. Such a series has the characteristic that any segment
of consecutive elements looks much like any other segment of the same duration,
regardless of the date at which it begins or ends.
If the residue follows a trend, or if it manifests a more or less regular
pattern, then it contains features which ought to have been attributed to the
other components; and we should set about the task of redening them.
There are two distinct purposes for which we might wish to eect such
a decomposition. The rst purpose is to give a summary description of the
salient features of the time series. Thus, if we eliminate the irregular and
seasonal components from the series, we are left with an index which may give
a clearer picture of the more important features. This might help us to gain
an insight into the fundamental workings of the economic or social structure
which has generated the series.
The other purpose in decomposing the series is to predict its future values.
For each component of the time series, a particular method of prediction is ap-
propriate. By combining the separate predictions of the components, a forecast
can be derived which may be superior to one derived by a method which pays
no attention to the underlying structure of the time series.
Extracting the Trend
There are essentially two ways of extracting trends from a time series. The
rst way is to apply to the series a variety of so-called lters which annihilate
or nullify all of the components which are not regarded as trends.
A lter is a carefully crafted moving average which spans a number of data
points and which attributes a weight to each of them. The weights should sum
to unity to ensure that the lter does not systematically inate or deate the
values of the series. Thus, for example, the following moving average might
serve to eliminate the annual cycle from an economic series which is recorded
at quarterly intervals:
(1.2)

Y
t
=
1
16
Y
t+3
+ 2Y
t+2
+ 3Y
t+1
+ 4Y
t
+ 3Y
t1
+ 2Y
t2
+Y
t3
.
Another lter with a wider span and a dierent prole of weights might serve
to eliminate the four-and-a-half-year cycle which is present in our imaginary
series of Gross National Product.
2

D.S.G. POLLOCK : TRENDS IN TIME SERIES
Finally a lter could be designed which smooths away the irregularities
of the index which defy systematic explanation. The order in which the three
lters are applied is immaterial; and what is left after they have been applied
should give a picture of the underlying trend {T
t
} of the index.
Other collections of lters, applied in series, might serve to isolate the
other components {C
t
} and {S
t
} which are to be found in equation (1).
The process of ltering is often a good way of deriving an index which rep-
resents the more important historical characteristics of the time series. How-
ever, it generates no model for the underlying trends; and it suggests no way
of predicting their future values.
The alternative way of extracting the trend from the index is to t some
function which is capable of adapting itself to whatever form the trend happens
to display. Dierent functions are appropriate to dierent forms of trend; and
some functions which analysts tend to favour see almost always to be inappro-
priate. Once an analytic function has been tted to the series, it may be used
to provide extrapolative forecasts of the trend.
Polynomial Trends
Amongst the mathematical functions which suggest themselves as means
of modelling a trend is a pth-degree polynomial whose argument is the time
index t:
(1.3) (t) =
0
+
1
t + +
p
t
p
.
When there is no theory to specify a mathematical form for the trend, it
may be possible to approximate it by a polynomial of low degree. This notion
is suggested by the formal result that every analytic mathematical function can
be expanded as a power series, which is an indenite sum whose terms contain
rising powers of the argument. Thus the polynomial in t may be construed as
an approximation to an analytic function which is obtained by discarding all
but the leading terms of a power-series expansion.
There are also arguments from physics which suggest that rst-degree and
second-degree polynomials in t, which are linear and quadratic time trends in
other words, are common in the natural world. The thought occurs to us that
such trends might also arise in the social world.
According to a well-known dictum,
Every body continues in its state of rest or of uniform motion in a straight
line unless it is compelled to change that state by forces impressed upon it.
This is Newtonss rst law of motion. The kinematic equation for the distance
covered by a body moving with constant velocity in a straight line is
(1.4) x = x
0
+ut,
3

where u is the uniform velocity, and x
0
represents the initial position of the
body at time t = 0. This is nothing but a rst-degree polynomial in t.
Newtons second law of motion asserts that
The change of motion is proportional to the motive force impressed; and is
made in the direction of the straight line in which the force is impressed.
In modern language, this is expressed by saying that the acceleration of a
body along a straight line is proportional to the force which is applied in that
direction. The kinematic equation for the distance travelled under uniformly
accelerated rectilinear motion is
(1.5) x = x
0
+u
0
t +
1
2
at
2
,
where u
0
is the velocity at time t = 0 and a is the constant acceleration due to
the motive force. This is just a quadratic in t.
A linear or a quadratic function may be appropriate if the trend in question
is monotonically increasing or decreasing. In other cases, polynomials of higher
degrees might be tted. Figure 1 is the result of tting a cubic function to an
economic time series by least-squares regression.
1920 1925 1930 1935 1940
140
150
160
170
180
Figure 1. A cubic function tted to data on meat
consumption in the United States, 19191941.
4

It might be felt that there are salient features in the data which are not
captured by the cubic polynomial. In that case, the recourse might be to
increase the degree of the polynomial by one. The result will be a curve which
ts the data more closely. Also, it will be found that one of the branches
of the polynomialthe left branch in this casehas changed direction. The
values found by extrapolating the quartic function backwards in time will dier
radically from those found by extrapolating the cubic function.
In general, the eect of altering the degree of the polynomial by one will
be to alter the direction of one or other of the branches of the tted function;
and, from the point of view of forecasting, this is a highly unsatisfactory cir-
cumstance. Another feature of a polynomial function is that its branches tend
to plus or minus innity with increasing rapidity as the argument increases or
decreases beyond a range of central values where the function has its stationary
points and its points of inection. This might also be regarded as a undesirable
property for a function which is to be used in extrapolative forecasting.
Some care has to be taken in tting a polynomial time trend by the method
of least-squares regression. A straightforward procedure, which comes imme-
diately to mind, is to form a matrix X of regressors in which the generic row
[t
0
, t, t
2
, . . . , t
p
] contains rising powers of the argument t. The annual data on
meat consumption, for example, which are plotted in Figure 1, run from 1919
to 1941; and these dates might be taken as the initial and terminal values of
t. In that case, there would be a vast dierences in the values of the elements
of the matrix X. For, whereas t
0
= 1 for all values of t = 1919, . . . , 1941, we
should nd that, when t = 1941, the value of t
3
is in excess of 7, 300 million.
Clearly, such a disparity of numbers taxes the precision of the computer.
An obvious recourse is to recode the values of t. Thus, we might take
t = 11, . . . , 11 for the range of the argument. The change would aect only the
value of the intercept term
0
which could be adjusted ex post. Unfortunately,
such a recourse in not always adequate to ensure the numerical accuracy of
the computation. The reason lies in the peculiarly ill-conditioned nature of the
matrix (X
X)
1
of cross products.
In fact, a specialised procedure of polynomial regression is often called for
in which the functions t
0
, t, . . . , t
p
are replaced by a set of so-called orthogo-
nal polynomials which give rise to vectors of regressors whose cross products
are zero-valued. The estimated coecients associated with these orthogonal
polynomials can be converted into the coecients
0
,
1
, . . . ,
p
of equation
(3).
Exponential and Logistic Trends
The notion of exponential or geometric growth is common in economics
where it is closely related to the idea of compound interest. Consider a nancial
asset with an annual rate of return of . The annual growth factor for an
5

investment of unit value is (1 +). If units were invested at time t = 0, and
if the returns were compounded with the principal on an annual basis, then the
value of the investment at time t would be given by
(1.6) y
t
= (1 +)
t
.
An investment which is compounded twice a year has an annual growth
factor of (1+
1
2
)
2
, and one which is compounded quarterly has a growth factor
of (1 +
1
4
)
4
. If an investment were compounded continuously, then its growth
factor would be lim(n )(1 +
1
n
)
n
= e
. The value of the asset at time t

would be given by
(1.7) y = e
t
;
and this is the equation for exponential growth.
The equation of exponential growth is a solution of the dierential equation
(1.8)
dy
dt
= y.
The implication of the dierential equation is that the absolute rate of growth
in y is proportional to the value already attained by y. It is equivalent to say
that the proportional rate of growth (1/y)(dy/dt) is constant.
An exponential growth trend can be tted to observations y
1
, . . . , y
n
, sam-
pled at regular intervals, by applying ordinary least-squares regression to the
equation
(1.9) ln y
t
= ln +t +
t
.
This is obtained by taking the logarithm of equation (7) and adding a distur-
bance term
t
. An alternative parametrisation is obtained by setting = e
.
Then the transformed growth equation becomes
(1.10) ln y
t
= ln + (ln )t +
t
,
and the geometric growth rate is 1.
Whereas unhindered exponential growth might well be a possibility for
certain monetary or nancial quantities, it is implausible to suggest that such
a process can be sustained for long when real resources are involved. Since real
resources are nite, we expect there to be upper limits to the levels which can
be attained by economic variables.
For an example of a trend with an upper bound, we might imagine a pro-
cess whereby the ownership of a consumer durable grows until the majority
6

of households or individuals are in possession of it. Good examples are pro-
vided by the sales of domestic electrical appliances such are fridges and colour
television sets.
Typically, when the new durable is introduced, the rate of sales is slow.
Then, as information about the durable, or experience of it, is spread amongst
consumers, the sales begin to accelerate. For a time, their cumulated total
might appear to follow an exponential growth path. Then come the rst signs
that the market is being saturated; and there is a point of inection in the
cumulative curve where its second derivativewhich is the rate of increase in
sales per periodpasses from positive to negative. Eventually, as the level of
ownership approaches the saturation point, the rate of sales will decline to a
constant level, which may be at zero, if the good is wholly durable, or at a
small positive replacement rate if it is not.
It is very dicult to specify the dynamics of a process such as the one we
have described whenever there are replacement sales to be taken into account.
The reason is that the replacement sales depend not only on the size of the
ownership of the durable goods but also upon the age of the stock of goods.
The latter is a function, at least in an early period, of the way in which sales
have grown at the outset. Often we have to be content with modelling only the
growth of ownership.
One of the simplest ways of modelling the growth of ownership is to employ
the so-called logistic curve. This classical device has its origins in the mathe-
matics of biology where it has been used to model the growth of a population
of animals in an environment with limited food resources.
0.25
0.5
1.0
4 2 2 4
Figure 2. The logistic function e
x
/(1 + e
x
) and its derivative. For large
negative values of x, the function and its derivative are close. In the case
of the exponential function e
x
, they coincide for all values of x.
7

The simplest version of the function is given by
(1.11) (x) =
1
1 +e
x
=
e
x
1 +e
x
.
The second expression comes from multiplying top and bottom of the rst
expression by e
x
. The logistic curve varies between a value of zero, which is
approached as x , and a value of unity, which is approached as x +.
At the mid point, where x = 0, the value of the function is (0) =
1
2
. These
characteristics can be understood easily in reference to the rst expression.
The alternative expression for the logistic curve also lends itself to an
interpretation. We may begin by noting that, for large negative values of x,
the term 1+e
x
, which is found in the denominator, is not signicantly dierent
from unity. Therefore, as x increases from such values towards zero, the logistic
function closely resembles an exponential function. By the time x reaches zero,
the denominator, with a value of 2, is already signicantly aected by the term
e
x
. At that point, there is an inection in the curve as the rate of increase in
begins to decline. Thereafter, the rate of increase declines rapidly toward zero,
with the eect that the value of never exceeds unity.
The inverse mapping x = x() is easily derived. Consider
(1.12)
1 =
1 +e
x
1 +e
x

e
x
1 +e
x
=
1
1 +e
x
=

e
x
.
This is rearranged to give
(1.13) e
x
=

1
,
whence the inverse function is found by taking natural logarithms:
(1.14) x() = ln
.
The logistic curve needs to be elaborated before it can be tted exibly
to a set of observations y
1
, . . . , y
n
tending to an upper asymptote. The general
from of the function is
(1.15) y(t) =

1 +e
h(t)
=
e
h(t)
1 +e
h(t)
; h(t) = +t.
Here is the upper asymptote of the function, which is the saturation level of
ownership in the example of the consumer durable. The parameters and
8

determine respectively the rate of ascent of the function and the mid point of
its ascent, measured on the time-axis.
It can be seen that
(1.16) ln
y(t)
y(t)
= h(t).
Therefore, with the inclusion of a residual term, the equation for the generic
element of the sample is
(1.17) ln
y
t
y
t
= +t +e
t
.
For a given value of , one may calculate the value of the dependent variable on
the LHS. Then the values of and may be found by least-squares regression.
The value of may also be determined according to the criterion of min-
imising the sum of squares of the residuals. A crude procedure would entail
running numerous regressions, each with a dierent value for . The deni-
tive value would be the one from the regression with the least residual sum of
squares. There are other procedures for nding the minimising value of of
a more systematic and ecient nature which might be used instead. Amongst
these are the methods of Golden Section Search and Fibonnaci Search which
are presented in many texts of numerical analysis.
The objection may be raised that the domain of the logistic function is
the entire real linewhich spans all of time from creation to eternitywhereas
the sales history of a consumer durable dates only from the time when it is
introduced to the market. The problem might be overcome by replacing the
time variable t in equation (15) by its logarithm and by allowing t to take only
nonnegative values. Then, whilst t [0, ), we still have ln(t) (, ),
which is the entire domain of the logistic function.
1 2 3 4
0.2
0.4
0.6
0.8
1.0
Figure 3. The function y(t) = /(1 + exp{ ln(t)}) with = 1,
= 4 and = 7. The positive values of t are the domain of the function.
9

There are many curves which will serve the purpose of modelling a sig-
moidal growth process. Their number is equal, at least, to the number of
theoretical probability density functionsfor the corresponding (cumulative)
distribution functions rise monotonically from zero to unity in ways with are
suggestive of processes of bounded growth.
In fact, we do not need to have an analytic form for a cumulative function
before it can be tted to a growth process. It is enough to have a table of
values of a standardised form of the function. An example is provided by the
normal density function whose distribution function is regularly tted to data
points in the course of probit analysis. In this case, the tting involves nding
values for the location parameter and the dispersion parameter
2
by which
the standard normal function is converted into an arbitrary normal function.
Nowadays, there are ecient procedures for numerical optimisation which can
accomplish such tasks with ease.
Flexible Trends
If the purpose of decomposing a time series is to form predictions of its
components, then it is important to obtain adequate representations of these
components at every point within the sample period. The device which is most
appropriate to the extrapolative forecasting of a trend is rarely the best means
of representing it within the sample. An extrapolation is usually based upon
a simple analytic function; and any attempt to make the function reect the
local variations of the sample will endow it with global characteristics which
may aect the forecasts adversely.
One way of modelling the local characteristics of a trend without prejudic-
ing its global characteristics is to use a segmented curve. In many applications,
it has been found that a curve with cubic polynomial segments is appropriate.
The segments must be joined in a way which avoids evident discontinuities. In
practice, the requirement is usually for continuous rst-order and second-order
derivatives. A curve whose segments are joined in this way is described as a
cubic spline.
A spline is a draughtsmans tool which was once used in drawing smooth
curves. It is a thin exible piece of wood which was clamped to a series of
pins which were placed along the path of the curve which had to be described.
Some of the essential properties of a mathematical spline can be understood
by bearing the real spline in mind. The pins to which a draughtsman clamped
his spline correspond to the data points through which we might interpolate a
mathematical spline. The segments of the mathematical spline would be joined
at the data points.
The cubic spline becomes a device for modelling a trend when, instead of
passing through the data points, it is allowed, in the interests of smoothness,
to deviate from them. The Reinsch smoothing spline is tted by minimising
10

1920 1925 1930 1935 1940
140
150
160
170
180
= 0.75
1920 1925 1930 1935 1940
140
150
160
170
180
= 0.125
Figure 4. Cubic smoothing splines tted to data on
meat consumption in the United States, 19191941.
11

a criterion function which imposes both a penalty for deviating from the data
points and a penalty for excessive curvature in the segments. The measure
of curvature is based upon second derivatives, whilst the measure of deviation
is the sum of the squared distances of the points from the curve. A single
parameter governs the trade-o between the objectives of smoothness and
goodness of t.
As an analogy for the smoothing spline, one might think of attaching the
draughtsmans spline to the pins by springs instead of by clamps. The precise
form of the curve would depend upon the stiness of the spline and the forces
exerted by the springs. The degree of exibility of the spline corresponds to
the value of . The forces exerted by ordinary springs are proportional to their
extension; and, in this respect, the analogy, which requires the forces to be
proportional to the squares of their extensions, is imperfect.
Figure 4 shows the consequences of tting the smoothing spline to the data
on meat consumption which is also used in Figure 1 where a cubic polynomial
has been tted. It is a matter of judgment how the value of should be chosen
so as to reect the trend.
There are various ways in which the curve of a cubic spline may be ex-
trapolated to form forecasts of the trend. In normal circumstances, when the
ends of the spline are left free, the second derivatives are zero-valued and the
extrapolation is linear. However, it is possible to clamp the ends of the spline
in a way which imposes a value on their rst derivatives. In that case, the
extrapolation is quadratic.
Stochastic Trends
It is possible that what is perceived as a trend is the result of the accumu-
lation of small stochastic uctuations which have no systematic basis. In that
case, there are some clearly dened ways of removing the trend from the data
as well as for extrapolating it into the future.
The simplest model embodying a stochastic trend is the so-called rst-
order random walk. Let {y
t
} be the random-walk sequence. Then its value at
time t is obtained from the previous value via the equation
(1.18) y
t
= y
t1
+
t
.
Here
t
is an element of a white-noise sequence of independently and identically
distributed random variables with
(1.19) E(
t
) = 0 and V (
t
) =
2
for all t.
By a process of back-substitution, the following expression can be derived:
(1.20) y
t
= y
0
+
t
+
t1
+ +
1
.
12

0
1
2
0
1
2
3
0 25 50 75 100
Figure 5. A sequence generated by a white-noise process.
This depicts y
t
as the sum of an initial value y
0
and of an accumulation of
stochastic increments. If y
0
has a xed nite value, then the mean and the
variance of y
t
are be given by
(1.21) E(y
t
) = y
0
and V (y
t
) = t
2
.
There is no central tendency in the random-walk process; and, if its starting
point is in the indenite past rather than at time t = 0, then the mean and
variance are undened.
To reduce the random walk to a stationary stochastic process, it is neces-
sary only to take its rst dierences. Thus
(1.22) y
t
y
t1
=
t
.
The values of a random walk, as the name implies, have a tendency to
wander haphazardly. However, if the variance of the white-noise process is
small, then the values of the stochastic increments will also be small and the
random walk will wander slowly. It is debatable whether the outcome of such
a process deserves to be called a trend.
A rst-order random walk over a surface is what is know as Brownian
motion. For a physical example of Brownian motion, one can imagine small
particles, such a pollen grains, oating on the surface of a viscous liquid. The
viscosity might be expected to bring the particles to a halt quickly if they
13

0
2
4
0
2
4
6
8
0 25 50 75 100
Figure 6. A rst-order random walk
0
50
0
50
100
150
200
0 25 50 75 100
Figure 7. A second-order random walk
14

were in motion. However, if the particles are very light, then they will dart
hither and thither on the surface of the liquid under the impact of its molecules
which are themselves in constant motion.
There is no better way of predicting the outcome of a random walk than
to take the most recently observed value and to extrapolate it indenitely into
the future. This is demonstrated by taking the expected values of the elements
of the equation
(1.23) y
t+h
= y
t+h1
+
t+h
which represents the value which lies h periods ahead at time t. The expecta-
tions, which are conditional upon the information of the set I
t
= {y
t
, y
t1
, . . .}
containing observations on the series up to time t, may be denoted as follows:
(1.24) E(y
t+h
|I
t
) =
y
t+h|t
, if h > 0;
y
t+h
, if h 0.
In these terms, the predictions of the values of the random walk for h > 1
periods ahead and for one period ahead are given, respectively, by
(1.25)
E(y
t+h
|I
t
) = y
t+h|t
= y
t+h1|t
,
E(y
t+1
|I
t
) = y
t+1|t
= y
t
.
The rst of these, which comes from (23), depends upon the fact that
E(
t+h
|I
t
) = 0. The second, which comes from taking expectations in the
equation y
t+1
= y
t
+
t+1
, uses the fact that the value of y
t
is already known.
The implication of the two equations is that y
t
serves as the optimal predictor
for all future values of the random walk.
A second-order random walk is formed by accumulating the values of a
rst-order process. Thus, if {
t
} and {y
t
} are respectively a white-noise se-
quence and the sequence from a rst-order random walk, then
(1.26)
z
t
= z
t1
+y
t
= z
t1
+y
t1
+
t
= 2z
t1
z
t2
+
t
denes the second-order random walk. Here the nal expression is obtained by
setting y
t1
= z
t1
z
t2
in the second expression. It is clear that, to reduce
the sequence {z
t
} to the stationary white-noise sequence, we must take rst
dierences twice in succession.
The nature of a second-order process can be understood by recognising
that it represents a trend in which the slopewhich is its rst dierence
follows a random walk. If the random walk wanders slowly, then the slope of
15

this trend is liable to change only gradually. Therefore, for extended periods,
the second-order random walk may appear to follow a linear time trend.
For a physical analogy of a second-order random walk, we can imagine a
body in motion which suers a series of small impacts. If the kinetic energy of
the body is large relative to the energy of the impacts, then its linear motion will
be disturbed only slightly. In order to predict where the body might be in some
future period, we simply extrapolate its linear motion free from disturbances.
To demonstrate that the forecast function for a second-order random walk
is a straight line, we may take the expectations, which are conditional upon I
t
,
of the elements of the the equation
(1.27) z
t+h
= 2z
t+h1
z
t+h2
+
t+h
.
For h periods ahead and for one period ahead, this gives
(1.28)
E(z
t+h
|I
t
) = z
t+h|t
= 2 z
t+h1|t
z
t+h2|t
,
E(z
t+1
|I
t
) = z
t+1|t
= 2z
t
z
t1
,
which together serve to dene a simple iterative scheme. It is straightforward
to conrm that these dierence equations have an analytic solution of the form
(1.29) z
t+h|t
= +h with = z
t
and = z
t
z
t1
,
which generates a linear time trend.
It is possible to dene random walks of higher orders. Thus a third-order
random walk is formed by accumulating the values of a second-order process.
A third-order process can be expected to give rise to local quadratic trends;
and the appropriate way of predicting its values is by quadratic extrapolation.
A stochastic trend of the random-walk variety may be elaborated by the
addition of an irregular component. A simple model consists of a rst-order
random walk with an added white-noise component. The model is specied by
the equations
(1.30)
y
t
=
t
+
t
,
t
=
t1
+
t
,
wherein
t
and
t
are generated by two mutually independent white-noise pro-
cesses.
The equations combine to give
(1.31)
y
t
y
t1
=
t

1
+
t

t1
=
t
+
t

t1
.
16

The expression on the RHS can be reformulated to give
(1.32)
t
+
t

t1
=
t

t1
,
where
t
and
t1
are elements of a white-noise sequence and is a parameter
of an appropriate value. Thus, the combination of the random walk and white
noise gives rise to the single equation
(1.33) y
t
= y
t1
+
t

t1
.
The forecast for h steps ahead, which is obtained by taking expectations
in the equation y
t+h
= y
t+h1
+
t+h

t+h1
, is given by
(1.34) E(y
t+h
|I
t
) = y
t+h|t
= y
t+h1|t
.
The forecast for one step ahead, which is obtained from the equation y
t+1
=
y
t
+
t+1

t
, is
(1.35)
E(y
t+1
|I
t
) = y
t+1|t
= y
t

t
= y
t
(y
t
y
t|t1
)
= (1 )y
t
+ y
t|t1
.
The result y
t|t1
= y
t1

t1
, which leads to the identity
t
= y
t
y
t|t1
upon which the second equality of (35) depends, reects the fact that, if the in-
formation at time t1 consists of the elements of the set I
t1
= {y
t1
, y
t2
, . . .}
and the value of , then
t1
is a know quantity which is unaected by the
process of taking expectations.
By applying a straightforward process of back-substitution to the nal
equation of (35), it will be found that
(1.36)
y
t+1|t
= (1 )(y
t
+y
t1
+ +
t1
y
1
) +
t
y
0
= (1 ){y
t
+y
t1
+
2
y
t2
+ },
where the nal expression stands for an innite series. This is a so-called
exponentially-weighted moving average; and it is the basis of the widely-used
forecasting procedure known as exponential smoothing.
To form the one-step-ahead forecast y
t+1|t
in the manner indicated by the
rst of the equations under (36), an initial value y
0
is required. Equation (34)
indicates that all the succeeding forecasts y
t+2|t
, y
t+3|t
etc. have the same value
as the one-step-ahead forecast.
It will transpire, in subsequent lectures, that equation (33) is a simple
example of an Integrated Autoregressive Moving-Average or ARIMA model.
There exists a readily accessible general theory of the forecasting of ARIMA
processes which we shall expound at length.
17

References
Eubank, R.L., (1988), Spline Smoothing and Nonparametric Regression, Marcel
Dekker Inc. New York.
Hamming, R.W., (1989), Digital Filters: Third Edition, PrenticeHall Inc.,
Englewood Clis, N.J.
Ratkowsky, D.L., (1985), Nonlinear Regression Modelling: A Unied Approach,
Marcel Dekker Inc. New York.
Reinsch, C.H., (1967), Smoothing by Spline Functions, Numerische Mathe-
matik, 10, 177183.
Schoenberg, I.J., (1964), Spline Functions and the Problem of Graduation,
Proceedings of the National Academy of Science, 52, 947950.
De Vos, A.F. and I.J. Steyn, (1990), Stochastic Nonlinearity: A Firm Ba-
sis for the Flexible Functional Form: Research Memorandum 1990-13, Vrije
Universiteit Amsterdam.
18

LECTURE 2
Seasons and Cycles
in Time Series
Cycles of a regular nature are often encountered in physics and engineering.
Consider a point moving with constant speed in a circle of radius . The point
might be the axis of the big end of a connecting rod which joins a piston to
a ywheel. Let time t be reckoned from an instant when the radius joining
the point to the centre is at an angle of below the horizontal. If the point is
projected onto the horizontal axis, then the distance of the projection from the
centre is given by
(2.1) x = cos(t ).
The movement of the projection back and forth along the horizontal axis is
described as simple harmonic motion.
The parameters of the function are as follows:
is the amplitude,
is the angular velocity or frequency and
is the phase displacement.
The angular velocity is a measure in radians per unit period. The quantity 2/
measures the period of the cycle. The phase displacement, also measured in
radians, indicates the extent to which the cosine function has been displaced by
a shift along the time axis. Thus, instead of the peak of the function occurring
at time t = 0, as it would with an ordinary cosine function, it now occurs a
time t = /.
Using the compound-angle formula cos(AB) = cos Acos B+sin Asin B,
we can rewrite equation (1) as
(2.2)
x = cos cos(t) + sin sin(t)
= cos(t) + sin(t),
with
(2.3) = cos , = sin and
2
+
2
=
2
.
19

Extracting a Regular Cyclical Component
A cyclical component which is concealed beneath other motions may be
extracted from a data sequence by a straightforward application of the method
of linear regression. An equation may be written in the form of
(2.4) y
t
= c
t
() + s
t
() + e
t
; t = 0, . . . , T 1,
where c
t
() = cos(t) and s
t
() = sin(t). To avoid the need for an intercept
term, the values of the dependent variable should be deviations about a mean
value. In matrix terms, equation (4) becomes
(2.5) y = [ c s ]
_
_
+ e,
where c = [c
0
, . . . , c
T1
]
and s = [s
0
, . . . , s
T1
]
and e = [e
0
, . . . , e
T1
]
are
vectors of T elements. The parameters , can be found by running regressions
for a wide range of values of and by selecting the regression which delivers
the lowest value for the residual sum of squares.
Such a technique may be used for extracting a seasonal component from
an economic time series; and, in that case, we know in advance what value
to give to . For the seasonality of economic activities is related, ultimately,
to the near-perfect regularities of the solar system which are reected in the
annual calender.
It may be unreasonable to expect that an idealised seasonal cycle can be
represented by a simple sinusoidal function. However, wave forms of a more
complicated nature may be synthesised by employing a series of sine and cosine
functions whose frequencies are integer multiples of the fundamental seasonal
frequency. If there are s = 2n observations per annum, then a general model
for a seasonal uctuation would comprise the frequencies
(2.6)
j
=
2j
s
, j = 0, . . . , n =
s
2
,
which are equally spaced in the interval [0, ]. Such a series of frequencies is
described as an harmonic scale.
A model of seasonal uctuation comprising the full set of harmonically-
related frequencies would take the form of
(2.7) y
t
=
n
j=0
_
j
cos(
j
t) +
j
sin(
j
t)
_
+ e
t
,
where e
t
is a residual element which might represent an irregular white-noise
component in the process underlying the data.
20

D.S.G. POLLOCK : SEASONS AND CYCLES
1 2 3 4
1
0.5
0.5
1
1 2 3 4
1
0.5
0.5
1
1 2 3 4
1
0.5
0.5
1
1 2 3 4
1
0.5
0.5
1
Figure 1. Trigonometrical functions, of frequencies
1
= /2 and
2
= , associated with a quarterly model of a seasonal uctuation.
At rst sight, it appears that there are s + 2 components in the sum.
However, when s is even, we have
(2.8)
sin(
0
t) = sin(0) = 0,
cos(
0
t) = cos(0) = 1,
sin(
n
t) = sin(t) = 0,
cos(
n
t) = cos(t) = (1)
t
.
Therefore there are only s nonzero coecients to be determined.
This simple seasonal model is illustrated adequately by the case of quar-
terly data. Matters are no more complicated in the case of monthly data. When
there are four observations per annum, we have
0
= 0,
1
= /2 and
2
= ;
and equation (7) assumes the form of
(2.9) y
t
=
0
+
1
cos
_
t
2
_
+
1
sin
_
t
2
_
+
2
(1)
t
+ e
t
.
If the four seasons are indexed by j = 0, . . . , 3, then the values from the
year can be represented by the following matrix equation:
(2.10)
_
_
y
0
y
1
y
2
y
3
_
_
=
_
_
1 1 0 1
1 0 1 1
1 1 0 1
1 0 1 1
_
_
_
2
_
_
+
_
_
e
0
e
1
e
2
e
3
_
_
.
21

It will be observed that the vectors of the matrix are mutually orthogonal.
When the data consist of T = 4p observations which span p years, the
coecients of the equation are given by
(2.11)
0
=
1
T
T1
t=0
y
t
,
1
=
2
T
p
=1
(y
0
y
2
),
1
=
2
T
p
=1
(y
1
y
3
),
2
=
1
T
p
=1
(y
0
y
1
+ y
2
y
3
).
It is the mutual orthogonality of the vectors of explanatory variables which
accounts for the simplicity of these formulae.
An alternative model of seasonality, which is used more often by econome-
tricians, assigns an individual dummy variable to each season. Thus, in place
of equation (10), we may take
(2.12)
_
_
y
0
y
1
y
2
y
3
_
_
=
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_
3
_
_
+
_
_
e
0
e
1
e
2
e
3
_
_
,
where
(2.13)
j
=
4
T
p
=1
y
j
, for j = 0, . . . , 3.
A comparison of equations (10) and (12) establishes the mapping from the
coecients of the trigonometrical functions to the coecients of the dummy
variables. The inverse mapping is
(2.14)
_
2
_
_
=
_
_
1
4
1
4
1
4
1
4
1
2
0
1
2
0
0
1
2
0
1
2
1
4

1
4
1
4

1
4
_
_
_
3
_
_
.
Another way of parametrising the model of seasonality is to adopt the
following form:
(2.15)
_
_
y
0
y
1
y
2
y
3
_
_
=
_
_
1 1 0 0
1 0 1 0
1 0 0 1
1 0 0 0
_
_
_
2
_
_
+
_
_
e
0
e
1
e
2
e
3
_
_
.
22

This scheme is unbalanced in that it does not treat each season in the same
manner. An attempt might be made to correct this feature by adding to the
matrix an extra column with a unit at the bottom and with zeros elsewhere and
by introducing an accompanying parameter
3
. However, the columns of the
resulting matrix will be linearly dependent; and this will make the parameters
indeterminate unless an additional constraint is imposed which sets
0
+ +
3
= 0.
The problem highlights a diculty which might arise if either of the
schemes under (10) or (12) were tted to the data by multiple regression in
the company of a polynomial (t) =
0
+
1
t + +
p
t
p
designed to capture
a trend. To make such a regression viable, one would have to eliminate the
intercept parameter
0
.
Irregular Cycles
Whereas it seems reasonable to model a seasonal uctuation in terms of
trigonometrical functions, it is dicult to accept that other cycles in economic
activity should have such regularity.
A classic expression of skepticism was made by Slutsky [19] in a famous
article of 1927:
Suppose we are inclined to believe in the reality of the strict period-
icity of the business cycle, such, for example, as the eight-year period
postulated by Moore. Then we should encounter another diculty.
Wherein lies the source of this regularity? What is the mechanism of
causality which, decade after decade, reproduces the same sinusoidal
wave which rises and falls on the surface of the social ocean with the
regularity of day and night?
It seems that something other than a perfectly regular sinusoidal component
is required to model the secular uctuations of economic activity which are
described as business cycles.
To obtain a model for a seasonal uctuation, it has been enough to modify
the equation of harmonic motion by superimposing a disturbance term which
aects the amplitude. To generate a cycle which is more fundamentally aected
by randomness, we must construct a model which has random eects in both
the phase and the amplitude.
To begin, let us imagine, once more, a point on the circumference of a circle
of radius which is travelling with an angular velocity of . At the instant
t = 0, when the point makes a positive angle of with the horizontal axis, the
coordinates are given by
(2.16) (, ) = ( cos , sin ).
23

To nd the coordinates of the point after it has rotated through an angle of
in one period of time, we may rotate the component vectors (, 0) and (0, )
separately and add them. The rotation of the components is depicted as follows:
(2.17)
(, 0)

(cos , sin ),
(0, )

( sin , cos ).
Their addition gives
(2.18) (, )

(y, z) = (cos sin , sin + cos ).
In matrix terms, the transformation becomes
(2.19)
_
y
z
_
=
_
cos sin
sin cos
_ _
_
.
To nd the values of the coordinates at a time which is an integral number of
periods ahead, we may transform the vector [y
, z
by premultiplying it the
appropriate number of times by the matrix of the rotation. Alternatively, we
may replace in equation (19) by whatever angle will be reached at the time
in question. In eect, equation (19) species the horizontal and vertical com-
ponents of a circular motion which amount to a pair of synchronous harmonic
motions.
To introduce the appropriate irregularities into the motion, we may add a
random disturbance term to each of its components. The discrete-time equation
of the resulting motion may be expressed as follows:
(2.20)
_
y
t
z
t
_
=
_
cos sin
sin cos
_ _
y
t1
z
t1
_
+
_
t
_
.
Now the character of the motion is radically altered. There is no longer any
bound on the amplitudes which the components might acquire in the long
run; and there is, likewise, a tendency for the phases of their cycles to drift
without limit. Nevertheless, in the absence of uncommonly large disturbances,
the trajectories of y and z are liable, in a limited period, to resemble those of
the simple harmonic motions.
It is easy to decouple the equations of y and z. The rst of the equations
within the matrix expression can be written as
(2.21) y
t
= cy
t1
sz
t1
+
t
.
The second equation may be lagged by one period and rearranged to give
(2.22) z
t1
cz
t2
= sy
t2
+
t1
.
24

By taking the rst dierence of equation (21) and by using equation (22) to
eliminate the values of z, we get
(2.23)
y
t
cy
t1
= cy
t1
c
2
y
t2
sz
t1
+ csz
t2
+
t
c
t1
= cy
t1
c
2
y
t2
s
2
y
t2
s
t1
+
t
c
t1
.
If we use the result that y
t2
cos
2
+y
t2
sin
2
= y
t2
and if we collect the dis-
turbances to form a new variable
t
=
t
s
t1
c
t1
, then we can rearrange
the second equality to give
(2.24) y
t
= 2 cos y
t1
y
t2
+
t
.
Here it is not true in general that the sequence of disturbances {
t
} will be
white noise. However, if we specify that, within equation (20),
(2.25)
_
t
_
=
_
sin
cos
_
t
,
where {
t
} is a white-noise sequence, then the lagged terms within
t
will cancel
leaving a sequence whose elements are mutually uncorrelated.
A sequence generated by equation (24) when {
t
} is a white-noise sequence
is depicted in Figure 2.
0
10
20
30
40
0
10
20
30
40
0 25 50 75 100
Figure 2. A quasi-cyclical sequence generated by the
equation y
t
= 2 cos y
t1
y
t2
+
t
when = 20
.
25

It is interesting to recognise that equation (24) becomes the equation of a
second-order random walk in the case where = 0. The second-order random
walk gives rise to trends which can remain virtually linear over considerable
periods.
Whereas there is little diculty in understanding that an accumulation of
purely random disturbances can give rise to linear trend, there is often surprise
at the fact that such disturbances can also generate cycles which are more or
less regular. An understanding of this phenomenon can be reached by con-
sidering a physical analogy. One such analogy, which is a very apposite, was
provided by Yule whose article of 1927 introduced the concept of a second-order
autoregressive process of which equation (24) is a limiting case. Yuless purpose
was to explain, in terms of random causes, a cycle of roughly 11 years which
characterises the Wolfer sunspot index.
Yule invited his readers to imagine a pendulum attached to a recording de-
vice. Any deviations from perfectly harmonic motion which might be recorded
must be the result of superimposed errors of observation which could be all
but eliminated if a long sequence of observations were subjected to a regression
analysis.
The recording apparatus is left to itself and unfortunately boys get
into the room and start pelting the pendulum with peas, sometimes
from one side and sometimes from the other. The motion is now
aected not by superposed uctuations but by true disturbances, and
the eect on the graph will be of an entirely dierent kind. The graph
will remain surprisingly smooth, but amplitude and phase will vary
continuously.
The phenomenon described by Yule is due to the inertia of the pendulum.
In the short term, the impacts of the peas impart very little energy to the
system compared with the sum of its kinetic and potential energies at any point
in time. However, on taking a longer view, we can see that, in the absence of
clock weights, the system is driven by the impacts alone.
The Fourier Decomposition of a Time Series
In spite of the notion that a regular trigonometrical function is an inappro-
priate means for modelling an economic cycle other than a seasonal uctuation,
there are good reasons to persist with the business of explaining a data sequence
in terms of such functions.
The Fourier decomposition of a series is a matter of explaining the series
entirely as a composition of sinusoidal functions. Thus it is possible to represent
the generic element of the sample as
(2.26) y
t
=
n
j=0
_
j
cos(
j
t) +
j
sin(
j
t)
_
.
26

Assuming that T = 2n is even, this sum comprises T functions whose frequen-
cies
(2.27)
j
=
2j
T
, j = 0, . . . , n =
T
2
are at equally spaced points in the interval [0, ].
As we might infer from our analysis of a seasonal uctuation, there are
as many nonzeros elements in the sum under (26) as there are data points,
for the reason that two of the functions within the sumnamely sin(
0
t) =
sin(0) and sin(
n
t) = sin(t)are identically zero. It follows that the mapping
from the sample values to the coecients constitutes a one-to-one invertible
transformation. The same conclusion arises in the slightly more complicated
case where T is odd.
The angular velocity
j
= 2j/T relates to a pair of trigonometrical com-
ponents which accomplish j cycles in the T periods spanned by the data. The
highest velocity
n
= corresponds to the so-called Nyquist frequency. If a
component with a frequency in excess of were included in the sum in (26),
then its eect would be indistinguishable from that of a component with a
frequency in the range [0, ]
To demonstrate this, consider the case of a pure cosine wave of unit am-
plitude and zero phase whose frequency lies in the interval < < 2. Let
= 2 . Then
(2.28)
cos(t) = cos
_
(2
)t
_
= cos(2) cos(
t) + sin(2) sin(
t)
= cos(
t);
which indicates that and
are observationally indistinguishable. Here,
[0, ] is described as the alias of > .

For an illustration of the problem of aliasing, let us imagine that a person
observes the sea level at 6am. and 6pm. each day. He should notice a very
gradual recession and advance of the water level; the frequency of the cycle
being f = 1/28 which amounts to one tide in 14 days. In fact, the true frequency
is f = 1 1/28 which gives 27 tides in 14 days. Observing the sea level every
six hours should enable him to infer the correct frequency.
Calculation of the Fourier Coecients
For heuristic purposes, we can imagine calculating the Fourier coecients
using an ordinary regression procedure to t equation (26) to the data. In
this case, there would be no regression residuals, for the reason that we are
estimating a total of T coecients from T data points; so we are actually
solving a set of T linear equations in T unknowns.
27

A reason for not using a multiple regression procedure is that, in this case,
the vectors of explanatory variables are mutually orthogonal. Therefore T
applications of a univariate regression procedure would be appropriate to our
purpose.
Let c
j
= [c
0j
, . . . , c
T1,j
]
and s
j
= [s
0,j
, . . . , s
T1,j
]
represent vectors of
T values of the generic functions cos(
j
t) and sin(
j
t) respectively. Then there
are the following orthogonality conditions:
(2.29)
c
i
c
j
= 0 if i = j,
s
i
s
j
= 0 if i = j,
c
i
s
j
= 0 for all i, j.
In addition, there are the following sums of squares:
(2.30)
c
0
c
0
= c
n
c
n
= T,
s
0
s
0
= s
n
s
n
= 0,
c
j
c
j
= s
j
s
j
=
T
2
.
The regression formulae for the Fourier coecients are therefore
(2.31)
0
= (i
i)
1
i
y =
1
T
t
y
t
= y,
(2.32)
j
= (c
j
c
j
)
1
c
j
y =
2
T
t
y
t
cos
i
t,
(2.33)
j
= (s
j
s
j
)
1
s
j
y =
2
T
t
y
t
sin
j
t.
By pursuing the analogy of multiple regression, we can understand that
there is a complete decomposition of the sum of squares of the elements of y
which is given by
(2.34) y
y =
2
0
i
i +
2
j
c
j
c
j
+
2
j
s
j
s
j
.
Now consider writing
2
0
i
i = y
2
i
i = y
y where y
= [ y, . . . , y] is the vector
whose repeated element is the sample mean y. It follows that y
y
2
0
i
i =
y
y y
y = (y y)
(y y). Therefore we can rewrite the equation as

(2.35) (y y)
(y y) =
T
2
j
_
2
j
+
2
j
_
=
T
2
2
j
,
28

and it follows that we can express the variance of the sample as
(2.36)
1
T
T1
t=0
(y
t
y)
2
=
1
2
n
j=1
(
2
j
+
2
j
)
=
2
T
2
j
_
_
t
y
t
cos
j
t
_
2
+
_
t
y
t
sin
j
t
_
2
_
.
The proportion of the variance which is attributable to the component at fre-
quency
j
is (
2
j
+
2
j
)/2 =
2
j
/2, where
j
is the amplitude of the component.
The number of the Fourier frequencies increases at the same rate as the
sample size T. Therefore, if the variance of the sample remains nite, and
if there are no regular harmonic components in the process generating the
data, then we can expect the proportion of the variance attributed to the
individual frequencies to decline as the sample size increases. If there is such
a regular component within the process, then we can expect the proportion of
the variance attributable to it to converge to a nite value as the sample size
increases.
In order provide a graphical representation of the decomposition of the
sample variance, we must scale the elements of equation (36) by a factor of T.
The graph of the function I(
j
) = (T/2)(
2
j
+
2
j
) is know as the periodogram.
10
20
30
40
0 /4 /2 3/4
Figure 3. The periodogram of Wolfers Sunspot Numbers 17491924.
29

There are many impressive examples where the estimation of the peri-
odogram has revealed the presence of regular harmonic components in a data
series which might otherwise have passed undetected. One of the best-know
examples concerns the analysis of the brightness or magnitude of the star T.
Ursa Major. It was shown by Whittaker and Robinson in 1924 that this series
could be described almost completely in terms of two trigonometrical functions
with periods of 24 and 29 days.
The attempts to discover underlying components in economic time-series
have been less successful. One application of periodogram analysis which was a
notorious failure was its use by William Beveridge in 1921 and 1923 to analyse
a long series of European wheat prices. The periodogram had so many peaks
that at least twenty possible hidden periodicities could be picked out, and this
seemed to be many more than could be accounted for by plausible explanations
within the realms of economic history.
Such ndings seem to diminish the importance of periodogram analysis
in econometrics. However, the fundamental importance of the periodogram is
established once it is recognised that it represents nothing less than the Fourier
transform of the sequence of empirical autocovariances.
The Empirical Autocovariances
A natural way of representing the serial dependence of the elements of a
data sequence is to estimate their autocovariances. The empirical autocovari-
ance of lag is dened by the formula
(2.37) c
=
1
T
T1
t=
(y
t
y)(y
t
y).
The empirical autocorrelation of lag is dened by r
= c
/c
0
where c
0
, which
is formally the autocovariance of lag 0, is the variance of the sequence. The
autocorrelation provides a measure of the relatedness of data points separated
by periods which is independent of the units of measurement.
It is straightforward to establish the relationship between the periodogram
and the sequence of autocovariances.
The periodogram may be written as
(2.38) I(
j
) =
2
T
_
_
T1
t=0
cos(
j
t)(y
t
y)
_
2
+
_
T1
t=0
sin(
j
t)(y
t
y)
_
2
_
.
The identity
t
cos(
j
t)(y
t
y) =
t
cos(
j
t)y
t
follows from the fact that, by
30

construction,
t
cos(
j
t) = 0 for all j. Expanding the expression in (38) gives
(2.39)
I(
j
) =
2
T
_
s
cos(
j
t) cos(
j
s)(y
t
y)(y
s
y)
_
+
2
T
_
s
sin(
j
t) sin(
j
s)(y
t
y)(y
s
y)
_
,
and, by using the identity cos(A) cos(B) +sin(A) sin(B) = cos(AB), we can
rewrite this as
(2.40) I(
j
) =
2
T
_
s
cos(
j
[t s])(y
t
y)(y
s
y)
_
.
Next, on dening = t s and writing c
t
(y
t
y)(y
t
y)/T, we can
reduce the latter expression to
(2.41) I(
j
) = 2
T1
=1T
cos(
j
)c
,
which is a Fourier transform of the sequence of empirical autocovariances.
An Appendix on Harmonic Cycles
Lemma 1. Let
j
= 2j/T where j {0, 1, . . . , T/2} if T is even and j
{0, 1, . . . , (T 1)/2} if T is odd. Then
T1
t=0
cos(
j
t) =
T1
t=0
sin(
j
t) = 0.
Proof. By Eulers equations, we have
T1
t=0
cos(
j
t) =
1
2
T1
t=0
exp(i2jt/T) +
1
2
T1
t=0
exp(i2jt/T).
By using the formula 1 + + +
T1
= (1
T
)/(1 ), we nd that
T1
t=0
exp(i2jt/T) =
1 exp(i2j)
1 exp(i2j/T)
.
But exp(i2j) = cos(2j) + i sin(2j) = 1, so the numerator in the expression
above is zero, and hence
t
exp(i2j/T) = 0. By similar means, we can show
31

that
t
exp(i2j/T) = 0; and, therefore, it follows that
t
cos(
j
t) = 0. An
analogous proof shows that
t
sin(
j
t) = 0.
Lemma 2. Let
j
= 2j/T where j 0, 1, . . . , T/2 if T is even and j
0, 1, . . . , (T 1)/2 if T is odd. Then
(a)
T1
t=0
cos(
j
t) cos(
k
t) =
_
0, if j = k;
T
2
, if j = k.
(b)
T1
t=0
sin(
j
t) sin(
k
t) =
_
0, if j = k;
T
2
, if j = k.
(c)
T1
t=0
cos(
j
t) sin(
k
t) = 0 ifj = k.
Proof. From the formula cos Acos B =
1
2
{cos(A+B) +cos(AB)} we have
T1
t=0
cos(
j
t) cos(
k
t) =
1
2
{cos([
j
+
k
]t) + cos([
j

k
]t)}
=
1
2
T1
t=0
{cos(2[j + k]t/T) + cos(2[j k]t/T)} .
We nd, in consequence of Lemma 1, that if j = k, then both terms on the RHS
vanish, and thus we have the rst part of (a). If j = k, then cos(2[j k]t/T) =
cos 0 = 1 and so, whilst the rst term vanishes, the second terms yields the
value of T under summation. This gives the second part of (a).
The proofs of (b) and (c) follow along similar lines.
References
Beveridge, Sir W. H., (1921), Weather and Harvest Cycles. Economic Jour-
nal, 31, 429452.
Beveridge, Sir W. H., (1922), Wheat Prices and Rainfall in Western Europe.
Journal of the Royal Statistical Society, 85, 412478.
Moore, H. L., (1914), Economic Cycles: Their Laws and Cause. Macmillan:
New York.
Slutsky, E., (1937), The Summation of Random Causes as the Source of Cycli-
cal Processes. Econometrica, 5, 105146.
Yule, G. U., (1927), On a Method of Investigating Periodicities in Disturbed
Series with Special Reference to Wolfers Sunspot Numbers. Philosophical
Transactions of the Royal Society, 89, 164.
32

LECTURE 3
Models and Methods
of Time-Series Analysis
A time-series model is one which postulates a relationship amongst a num-
ber of temporal sequences or time series. An example is provided by the simple
regression model
(3.1) y(t) = x(t) + (t),
where y(t) = {y
t
; t = 0, 1, 2, . . .} is a sequence, indexed by the time subscript
t, which is a combination of an observable signal sequence x(t) = {x
t
} and an
unobservable white-noise sequence (t) = {
t
} of independently and identically
distributed random variables.
A more general model, which we shall call the general temporal regression
model, is one which postulates a relationship comprising any number of con-
secutive elements of x(t), y(t) and (t). The model may be represented by the
equation
(3.2)
p
i=0
i
y(t i) =
k
i=0
i
x(t i) +
q
i=0
i
(t i),
where it is usually taken for granted that
0
= 1. This normalisation of the
leading coecient on the LHS identies y(t) as the output sequence. Any of
the sums in the equation can be innite, but if the model is to be viable, the
sequences of coecients {
i
}, {
i
} and {
i
} can depend on only a limited
number of parameters.
Although it is convenient to write the general model in the form of (2), it
is also common to represent it by the equation
(3.3) y(t) =
p
i=1
i
y(t i) +
k
i=0
i
x(t i) +
q
i=0
i
(t i),
where
i
=
i
for i = 1, . . . , p. This places the lagged versions of the se-
quence y(t) on the RHS in the company of the input sequence x(t) and its lags.
33

Whereas engineers are liable to describe this as a feedback model, economists
are more likely to describe it as a model with lagged dependent variables.
The foregoing models are termed regression models by virtue of the in-
clusion of the observable explanatory sequence x(t). When x(t) is deleted, we
obtain a simpler unconditional linear stochastic model:
(3.4)
p
i=0
i
y(t i) =
q
i=0
i
(t i).
This is the autoregressive moving-average (ARMA) model.
A time-series model can often assume a variety of forms. Consider a simple
dynamic regression model of the form
(3.5) y(t) = y(t 1) + x(t) + (t),
where there is a single lagged dependent variable. By repeated substitution,
we obtain
(3.6)
y(t) = y(t 1) + x(t) + (t)
=
2
y(t 2) +
_
x(t) + x(t 1)
_
+ (t) + (t 1)
.
.
.
=
n
y(t n) +
_
x(t) + x(t 1) + +
n1
x(t n + 1)
_
+ (t) + (t 1) + +
n1
(t n + 1).
If || < 1, then lim(n )
n
= 0; and it follows that, if x(t) and (t) are
bounded sequences, then, as the number of repeated substitutions increases
indenitely, the equation will tend to the limiting form of
(3.7) y(t) =
i=0
i
x(t i) +
i=0
i
(t i).
It is notable that, by this process of repeated substitution, the feedback
structure has been eliminated from the model. As a result, it becomes easier
to assess the impact upon the output sequence of changes in the values of the
input sequence. The direct mapping from the input sequence to the output
sequence is described by engineers as a transfer function or as a lter.
For models more complicated than the one above, the method of repeated
substitution, if pursued directly, becomes intractable. Thus we are motivated
to use more powerful algebraic methods to eect the transformation of the
equation. This leads us to consider the use of the so-called lag operator. A
proper understanding of the lag operator depends upon a knowledge of the
algebra of polynomials and of rational functions.
34

D.S.G. POLLOCK : MODELS AND METHODS
The Algebra of the Lag Operator
A sequence x(t) = {x
t
; t = 0, 1, 2, . . .} is any function mapping from
the set of integers Z = {0, 1, 2, . . .} to the real line. If the set of integers
represents a set of dates separated by unit intervals, then x(t) is described as
a temporal sequence or a time series.
The set of all time series represents a vector space, and various linear
transformations or operators can be dened over the space. The simplest of
these is the lag operator L which is dened by
(3.8) Lx(t) = x(t 1).
Now, L{Lx(t)} = Lx(t 1) = x(t 2); so it makes sense to dene L
2
by
L
2
x(t) = x(t 2). More generally, L
k
x(t) = x(t k) and, likewise, L
k
x(t) =
x(t +k). Other operators are the dierence operator = I L which has the
eect that
(3.9) x(t) = x(t) x(t 1),
the forward-dierence operator = L
1
I, and the summation operator
S = (I L)
1
= {I + L + L
2
+ } which has the eect that
(3.10) Sx(t) =
i=0
x(t i).
In general, we can dene polynomials of the lag operator of the form p(L) =
p
0
+ p
1
L + + p
n
L
n
=
p
i
L
i
having the eect that
(3.11)
p(L)x(t) = p
0
x(t) + p
1
x(t 1) + + p
n
x(t n)
=
n
i=0
p
i
x(t i).
In these terms, the equation under (2) of the general temporal model becomes
(3.12) (L)y(t) = (L)x(t) + (L)(t).
The advantage which comes from dening polynomials in the lag operator
stems from the fact that they are isomorphic to the set of ordinary algebraic
polynomials. Thus we can rely upon what we know about ordinary polynomials
to treat problems concerning lag-operator polynomials.
35

Algebraic Polynomials
Consider the equation
0
+
1
z +
2
z
2
= 0. Once the equation has been
divided by
2
, it can be factorised as (z
1
)(z
2
) where
1
,
2
are the
roots or zeros of the equation which are given by the formula
(3.13) =

1
2
1
4
2
0
2
2
.
If
2
1
4
2
0
, then the roots
1
,
2
are real. If
2
1
= 4
2
0
, then
1
=
2
.
If
2
1
< 4
2
0
, then the roots are the conjugate complex numbers = + i,
= i, where i =
1.
There are three alternative ways of representing the conjugate complex
numbers and
:
(3.14)
= + i = (cos + i sin ) = e
i
,
= i = (cos i sin ) = e
i
,
where
(3.15) =
_
2
+
2
and = tan
1
_
_
.
These are called, respectively, the Cartesian form, the trigonometrical form and
the exponential form.
The Cartesian and trigonometrical representations are understood by con-
sidering the Argand diagram:
*
Re
Im
Figure 1. The Argand Diagram showing a complex
number = + i and its conjugate
= i.
36

The exponential form is understood by considering the following series
expansions of cos and i sin about the point = 0:
(3.16)
cos =
_
1

2
2!
+

4
4!

6
6!
+
_
,
i sin =
_
i
i
3
3!
+
i
5
5!

i
7
7!
+
_
.
Adding these gives
(3.17)
cos + i sin =
_
1 + i

2
2!

i
3
3!
+

4
4!
+
_
= e
i
.
Likewise, by subtraction, we get
(3.18)
cos i sin =
_
1 i

2
2!
+
i
3
3!
+

4
4!

_
= e
i
.
These are Eulers equations. It follows from adding (17) and (18) that
(3.19) cos =
e
i
+ e
i
2
.
Subtracting (18) from (17) gives
(3.20)
sin =
i
2
(e
i
e
i
)
=
1
2i
(e
i
e
i
).
Now consider the general equation of the nth order:
(3.21)
0
+
1
z +
2
z
2
+ +
n
z
n
= 0.
On dividing by
n
, we can factorise this as
(3.22) (z
1
)(z
2
) (z
n
) = 0,
where some of the roots may be real and others may be complex. The complex
roots come in conjugate pairs, so that, if = + i is a complex root, then
there is a corresponding root
= i such that the product (z)(z
) =
z
2
+ 2z + (
2
+
2
) is real and quadratic. When we multiply the n factors
together, we obtain the expansion
(3.23) 0 = z
n
i
z
n1
+
j
z
n2
(1)
n
2

n
.
37

This can be compared with the expression (
0
/
n
)+(
1
/
n
)z + +z
n
=
0. By equating coecients of the two expressions, we nd that (
0
/
n
) =
(1)
n
i
or, equivalently,
(3.24)
n
=
0
n
i=1
(
i
)
1
.
Thus we can express the polynomial in any of the following forms:
(3.25)
i
z
i
=
n
(z
i
)
=
0
(
i
)
1
(z
i
)
=
0
_
1
z
i
_
.
We should also note that, if is a root of the primary equation
i
z
i
= 0,
where rising powers of z are associated with rising indices on the coecients,
then = 1/ is a root of the equation
i
z
ni
= 0, which has declining
powers of z instead. This follows since
i
=
i
= 0 implies that
i
=
ni
= 0. Confusion can arise from not knowing which of
the two equations one is dealing with.
Rational Functions of Polynomials
If (z) and (z) are polynomial functions of z of degrees d and g respec-
tively with d < g, then the ratio (z)/(z) is described as a proper rational
function. We shall often encounter expressions of the form
(3.26) y(t) =
(L)
(L)
x(t).
For this to have a meaningful interpretation in the context of a time-series
model, we normally require that y(t) should be a bounded sequence whenever
x(t) is bounded. The necessary and sucient condition for the boundedness of
y(t), in that case, is that the series expansion of (z)/(z) should be convergent
whenever |z| 1. We can determine whether or not the sequence will converge
by expressing the ratio (z)/(z) as a sum of partial fractions. The basic result
is as follows:
(3.27) If (z)/(z) = (z)/{
1
(z)
2
(z)} is a proper rational function, and
if
1
(z) and
2
(z) have no common factor, then the function can
be uniquely expressed as
(z)
(z)
=

1
(z)
1
(z)
+

2
(z)
2
(z)
,
where
1
(z)/
1
(z) and
2
(z)/
2
(z) are proper rational functions.
38

Imagine that (z) =
(1z/
i
). Then repeated applications of this basic
result enables us to write
(3.28)
(z)
(z)
=

1
1 z/
1
+

2
1 z/
2
+ +

g
1 z/
g
.
By adding the terms on the RHS, we nd an expression with a numerator of
degree n 1. By equating the terms of the numerator with the terms of (z),
we can nd the values
1
,
2
, . . . ,
g
. The convergence of the expansion of
(z)/(z) is a straightforward matter. For the series converges if and only if
the expansion of each of the partial fractions converges. For the expansion
(3.29)

1 z/
=
_
1 + z/ + (z/)
2
+
_
to converge when |z| 1, it is necessary and sucient that || > 1.
Example. Consider the function
(3.30)
3z
1 + z 2z
2
=
3z
(1 z)(1 + 2z)
=

1
1 z
+

2
1 + 2z
=

1
(1 + 2z) +
2
(1 z)
(1 z)(1 + 2z)
.
Equating the terms of the numerator gives
(3.31) 3z = (2
1
2
)z + (
1
+
2
),
so
2
=
1
, which gives 3 = (2
1

2
) = 3
1
; and thus we have
1
= 1,
2
= 1.
Linear Dierence Equations
An nth-order linear dierence equation is a relationship amongst n + 1
consecutive elements of a sequence x(t) of the form
(3.32)
0
x(t) +
1
x(t 1) + +
n
x(t n) = u(t),
where u(t) is some specied sequence which is described as the forcing function.
The equation can be written, in a summary notation, as
(3.33) (L)x(t) = u(t),
39

where (L) =
0
+
1
L+ +
n
L
n
. If n consecutive values of x(t) are given,
say x
1
, x
2
, . . . , x
n
, then the relationship can be used to nd the succeeding value
x
n+1
. In this way, so long as u(t) is fully specied, it is possible to generate any
number of the succeeding elements of the sequence. The values of the sequence
prior to t = 1 can be generated likewise; and thus, in eect, we can deduce
the function x(t) from the dierence equation. However, instead of a recursive
solution, we often seek an analytic expression for x(t).
The function x(t; c), expressing the analytic solution, will comprise a set
of n constants in c = [c
1
, c
2
, . . . , c
n
]
which can be determined once we are

given a set of n consecutive values of x(t) which are called initial conditions.
The general analytic solution of the equation (L)x(t) = u(t) is expressed as
x(t; c) = y(t; c) + z(t), where y(t) is the general solution of the homogeneous
equation (L)y(t) = 0, and z(t) =
1
(L)u(t) is called a particular solution of
the inhomogeneous equation.
We may solve the dierence equation in three steps. First, we nd the
general solution of the homogeneous equation. Next, we nd the particular
solution z(t) which embodies no unknown quantities. Finally, we use the n
initial values of x to determine the constants c
1
, c
2
, . . . , c
n
. We shall discuss in
detail only the solution of the homogeneous equation.
Solution of the Homogeneous Dierence Equation
If
j
is a root of the equation (z) =
0
+
1
z + +
n
z
n
= 0 such that
(
j
) = 0, then y
j
(t) = (1/
j
)
t
is a solution of the equation (L)y(t) = 0.
This can be see this by considering the expression
(3.34)
(L)
_
1
j
_
t
=
_
0
+
1
L + +
n
L
n
_
_
1
j
_
t
=
0
_
1
j
_
t
+
1
_
1
j
_
t1
+ +
n
_
1
j
_
tn
=
_
0
+
1
j
+ +
n
n
j
_
_
1
j
_
t
= (
j
)
_
1
j
_
t
.
Alternatively, one may consider the factorisation (L) =
0
i
(1 L/
i
).
Within this product is the term 1 L/
j
; and since
_
1
L
j
__
1
j
_
t
=
_
1
j
_
t
_
1
j
_
t
= 0,
it follows that (L)(1/
j
)
t
= 0.
40

The general solution, in the case where (L) = 0 has distinct real roots, is
given by
(3.35) y(t; c) = c
1
_
1
1
_
t
+ c
2
_
1
2
_
t
+ + c
n
_
1
n
_
t
,
where c
1
, c
2
, . . . , c
n
are the constants which are determined by the initial con-
ditions.
In the case where two roots coincide at a value of
j
, the equation (L)y(t)
= 0 has the solutions y
1
(t) = (1/
j
)
t
and y
2
(t) = t(1/
j
)
t
. To show this, let us
extract the term (1 L/
j
)
2
from the factorisation (L) =
0
i
(1 L/
i
).
Then, according to the previous argument, we have (1 L/
j
)
2
(1/
j
)
t
= 0,
but, also, we have
(3.36)
_
1
L
j
_
2
t
_
1
j
_
t
=
_
1
2L
j
+
L
2
2
j
_
t
_
1
j
_
t
= t
_
1
j
_
t
2(t 1)
_
1
j
_
t
+ (t 2)
_
1
j
_
t
= 0.
In general, if there are r repeated roots with a value of
j
, then all of (1/
j
)
t
,
t(1/
j
)
t
, t
2
(1/
j
)
t
, . . . , t
r1
(1/
j
)
t
are solutions to the equation (L)y(t) = 0.
A particularly important special case arises when there are r repeated
roots of unit value. Then the functions 1, t, t
2
, . . . , t
r1
are all solutions to the
homogeneous equation. With each solution is associated a coecient which
can be determined in view of the initial conditions. If these coecients are
d
0
, d
1
, d
2
, . . . , d
r1
then, within the general solution of the homogeneous equa-
tion, there will be found the term d
0
+d
1
t+d
2
t
2
+ +d
r1
t
r1
which represents
a polynomial in t of degree r 1.
The 2nd-order Dierence Equation with Complex Roots
Imagine that the 2nd-order equation (L)y(t) =
0
y(t) +
1
y(t 1) +
2
y(t 2) = 0 is such that (z) = 0 has complex roots = 1/ and
= 1/
.
If ,
are conjugate complex numbers, then so too are ,
. Therefore, let
us write
(3.37)
= + i = (cos + i sin ) = e
i
,
= i = (cos i sin ) = e
i
.
These will appear in a general solution of the dierence equation of the form
(3.38) y(t) = c
t
+ c
)
t
.
41

0
2
4
6
8
0
2
0 5 10 15 20 25 0 5
p
1
p
2
Figure 2. The solution of the homogeneous dierence equation (1
1.69L + 0.81L
2
)y(t) = 0 for the initial conditions y
0
= 1 and y
1
= 3.69.
The time lag of the phase displacement p
1
and the duration of the cycle p
2
are also indicated.
This represents a real-valued sequence; and, since a real term must equal its
own conjugate, it follows that c and c
must be conjugate numbers of the form

(3.39)
c
= (cos + i sin ) = e
i
,
c = (cos i sin ) = e
i
.
Thus the general solution becomes
(3.40)
c
t
+ c
)
t
= e
i
(e
i
)
t
+ e
i
(e
i
)
t
=
t
_
e
i(t)
+ e
i(t)
_
= 2
t
cos(t ).
To analyse the nal expression, consider rst the factor cos(t ). This
is a displaced cosine wave. The value , which is a number of radians per unit
period, is called the angular velocity or the angular frequency of the wave. The
value f = /2 is its frequency in cycles per unit period. The duration of one
cycle, also called the period, is r = 2/.
The term is called the phase displacement of the cosine wave, and it
serves to shift the cosine function along the axis of t so that, in the absence of
damping, the peak would occur at the value of t = / instead of at t = 0.
42

Next consider the term
t
wherein =

(
2
+
2
) is the modulus of the
complex roots. When has a value of less than unity, it becomes a damping
factor which serves to attenuate the cosine wave as t increases. The damping
also serves to shift the peaks of the cosine function slightly to the left.
Finally, the factor 2 aects the initial amplitude of the cosine wave which
is the value which it assumes when t = 0. Since is just the modulus of the
values c and c
, this amplitude reects the initial conditions. The phase angle

is also a product of the initial conditions.
It is instructive to derive an expression for the second-order dierence equa-
tion which is in terms of the parameters of the trigonometrical or exponential
representations of a pair of complex roots. Consider
(3.41)
(z) =
0
(1 z)(1
z)
=
0
_
1 ( +
)z +
z
2
_
,
From (37) it follows that
(3.42) +
= 2cos and
=
2
.
Therefore the polynomial operator which is entailed by the dierence equation
is
(3.43)
0
+
1
L +
2
L
2
=
0
(1 2cos L +
2
L
2
);
and it is usual to set
0
= 1. This representation indicates that a necessary
condition for the roots to be complex, which is not a sucient condition, is
that
2
/
0
> 0.
It is easy to ascertain by inspection whether or not the second-order dif-
ference equation is stable. The condition that the roots of (z) = 0 must lie
outside the unit circle, which is necessary and sucient for stability, imposes
certain restrictions on the coecients of (z) which can be checked easily.
We can reveal these conditions most readily by considering the auxiliary
polynomial (z) = z
2
(z
1
) whose roots, which are the inverses of those of
(z), must lie inside the unit circle. Let the roots of (z), which might be real
or complex, be denoted by
1
,
2
. Then we can write
(3.44)
(z) =
0
z
2
+
1
z +
2
=
0
(z
1
)(z
2
)
=
0
_
z
2
(
1
+
2
)z +
1
2
_
,
where is is assumed that
0
> 0. This indicates that
2
/
0
=
1
2
. Therefore
the conditions |
1
|, |
2
| < 1 imply that
(3.45)
0
<
2
<
0
.
43

If the roots are complex conjugate numbers ,
= i, then this condition

will ensure that
=
2
/
0
< 1, which is the condition that they are within
the unit circle.
Now consider the fact that, if
0
> 0, then the function (z) will have a
minimum value over the real line which is greater than zero if the roots are
complex and no greater than zero if they are real. If the roots are real, then
they will be found in the interval (1, 1) if and only if
(3.46)
(1) =
0
1
+
2
> 0 and
(1) =
0
+
1
+
2
> 0.
If the roots are complex then these conditions are bound to be satised.
From these arguments, it follows that the conditions under (45) and (46)
in combination are necessary and sucient to ensure that the roots of (z) = 0
are within the unit circle and that the roots of (z) = 0 are outside.
State-Space Models
An nth-order dierence equation in a single variable can be transformed
into a rst-order system in n variables which are the elements of a so-called
state vector.
There is a wide variety of alternative forms which can be assumed by
a rst-order vector dierence equation corresponding to the nth-order scalar
equation. However, certain of these are described as canonical forms by virtue
of special structures in the matrix.
In demonstrating one of the more common canonical forms, let us consider
again the nth-order dierence equation of (32), in reference to which we may
dene the following variables:
(3.47)
1
(t) = x(t),
2
(t) =
1
(t 1) = x(t 1),
.
.
.
n
(t) =
n1
(t 1) = x(t n + 1).
On the basis of these denitions, a rst-order vector equation may be con-
structed in the form of
(3.48)
_
1
(t)
2
(t)
.
.
.
n
(t)
_
_
=
_
1
. . .
n1

n
1 . . . 0 0
.
.
.
.
.
.
.
.
.
.
.
.
0 . . . 1 0
_
_
_
1
(t 1)
2
(t 1)
.
.
.
n
(t 1)
_
_
+
_
_
1
0
.
.
.
0
_
_
(t).
44

The matrix in this structure is sometimes described as the companion form.
Here it is manifest, in view of the denitions under (47), that the leading
equation of the system, which is
(3.49)
1
(t) =
1
1
(t 1) + +
n
n
(t 1) + (t),
is precisely the equation under (32).
Example. An example of a system which is not in a canonical form is provided
by the following matrix equation:
(3.50)
_
y(t)
z(t)
_
=
_
cos sin
sin cos
_ _
y(t 1)
z(t 1)
_
+
_
(t)
(t)
_
.
With the use of the lag operator, the equation can also be written as
(3.51)
_
1 cos L sin L
sin L 1 cos L
_ _
y(t)
z(t)
_
=
_
(t)
(t)
_
.
On premultiplying the equation by the inverse of the matrix on the LHS, we
get
(3.52)
_
y(t)
z(t)
_
=
1
1 2cos L +
2
L
2
_
1 cos L sin L
sin L 1 cos L
_ _
(t)
(t)
_
.
A special case arises when
(3.53)
_
(t)
(t)
_
=
_
sin
cos
_
(t),
where (t) is a white-noise sequence. Then the equation becomes
(3.54)
_
y(t)
z(t)
_
=
1
1 2cos L +
2
L
2
_
sin
cos
_
(t).
On dening (t) = sin (t) we may write the rst of these equations as
(3.55) (1 2cos L +
2
L
2
)y(t) = (t).
This is just a second-order dierence equation with a white-noise forcing func-
tion; and, by virtue of the inclusion of the damping factor [0, 1), it repre-
sents a generalisation of the equation to be found under (2.24).
45

Transfer Functions
Consider again the simple dynamic model of equation (5):
(3.56) y(t) = y(t 1) + x(t) + (t).
With the use of the lag operator, this can be rewritten as
(3.57) (1 L)y(t) = x(t) + (t)
or, equivalently, as
(3.58) y(t) =

1 L
x(t) +
1
1 L
(t).
The latter is the so-called rational transfer-function form of the equation. The
operator L within the transfer functions or lters can be replaced by a complex
number z. Then the transfer function which is associated with the signal x(t)
becomes
(3.59)

1 z
=
_
1 + z +
2
z
2
+
_
,
where the RHS comes from a familiar power-series expansion.
The sequence {, ,
2
, . . .} of the coecients of the expansion consti-
tutes the impulse response of the transfer function. That is to to say, if we
imagine that, on the input side, the signal is a unit-impulse sequence of the
form
(3.60) x(t) = {. . . , 0, 1, 0, 0, . . .},
which has zero values at all but one instant, then its mapping through the
transfer function would result in an output sequence of
(3.61) r(t) = {. . . , 0, , ,
2
, . . .}.
Another important concept is the step response of the lter. We may
imagine that the input sequence is zero-valued up to a point in time when it
assumes a constant unit value:
(3.62) x(t) = {. . . , 0, 1, 1, 1, . . .}.
The mapping of this sequence through the transfer function would result in an
output sequence of
(3.63) s(t) = {. . . , 0, , + , + +
2
, . . .}
46

whose elements, from the point when the step occurs in x(t), are simply the
partial sums of the impulse-response sequence. This sequence of partial sums
{, + , + +
2
, . . .} is described as the step response. Given that
|| < 1, the step response converges to a value
(3.64) =

1
which is described as the steady-state gain or the long-term multiplier of the
transfer function.
These various concepts apply to models of any order. Consider the equa-
tion
(3.65) (L)y(t) = (L)x(t) + (t),
where
(3.66)
(L) = 1 +
1
L + +
p
L
p
= 1
1
L
p
L
p
,
(L) =
0
+
1
L + +
k
L
k
are polynomials of the lag operator. The transfer-function form of the model
is simply
(3.67) y(t) =
(L)
(L)
x(t) +
1
(L)
(t),
The rational function associated with x(t) has a series expansion
(3.68)
(z)
(z)
= (z)
=
_
0
+
1
z +
2
z
2
+
_
;
and the sequence of the coecients of this expansion constitutes the impulse-
response function. The partial sums of the coecients constitute the step-
response function. The gain of the transfer function is dened by
(3.69) =
(1)
(1)
=

0
+
1
+ +
k
1 +
1
+ +
p
.
The method of nding the coecients of the series expansion of the transfer
function in the general case can be illustrated by the second-order case:
(3.70)

0
+
1
z
1
1
z
2
z
2
=
_
0
+
1
z +
2
z
2
+
_
.
47

We rewrite this equation as
(3.71)
0
+
1
z =
_
1
1
z
2
z
2
__
0
+
1
z +
2
z
2
+
_
.
Then, by performing the multiplication on the RHS, and by equating the co-
ecients of the same powers of z on the two sides of the equation, we nd
that
(3.72)
0
=
0
,
1
=
1
0
,
0 =
2
0
,
.
.
.
0 =
n
n1
n2
,
0
=
0
,
1
=
1
+
1
0
,
2
=
1
1
+
2
0
,
.
.
.
n
=
1
n1
+
2
n2
.
The necessary and sucient condition for the convergence of the sequence
{
i
} is that the roots of the primary polynomial equation 1
1
z
2
z
2
= 0
should lie outside the unit circle or, equivalently, that the roots of the auxiliary
equation z
2
1
z
2
= 0which are the inverses of the former rootsshould
lie inside the unit circle. If the roots of these equations are real, then the
sequence will converge monotonically to zero whereas, if the roots are complex-
valued, then the sequence will converge in the manner of a damped sinusoid.
It is clear that the equation
(3.73) (n) =
1
(n 1) +
2
(n 2),
which serves to generate the elements of the impulse response, is nothing but
a second-order homogeneous dierence equation. In fact, Figure 2, which has
been presented as the solution to a homogeneous dierence equation, represents
the impulse response of the transfer function (1 + 2L)/(1 1.69L + 0.81L
2
).
In the light of this result, it is apparent that the coecients of the denomi-
nator polynomial 1
1
z
2
z
2
serve to determine the period and the damping
factor of a complex impulse response. The coecients in the numerator poly-
nomial
0
+
1
z serve to determine the initial amplitude of the response and
its phase lag. It seems that all four coecients must be present if a second-
order transfer function is to have complete exibility in modelling a dynamic
response.
The Frequency Response
In many applications within forecasting and time-series analysis, it is of
interest to consider the response of a transfer function to a signal which is a
simple sinusoid. As we have indicated in a previous lecture, it is possible
48

0
10
20
30
40
50
0 /2 /2
Figure 3.The gain of the transfer function (1 + 2L
2
)/(1 1.69L + 0.81L
2
).
0 /2 /2
Figure 4.The phase diagram of the transfer function (1 + 2L

2
)/(1 1.69L + 0.81L
2
).
49

to represent a nite sequence as a sum of sine and cosine functions whose
frequencies are integer multiples of a fundamental frequency. More generally, it
is possible, as we shall see later, to represent an arbitrary stationary stochastic
process as a combination of an innite number of sine and cosine functions
whose frequencies range continuously in the interval [0, ]. It follows that the
eect of a transfer function upon stationary signals can be characterised in
terms of its eect upon the sinusoidal functions.
Consider therefore the consequences of mapping the signal x(t) = cos(t)
through the transfer function (L) =
0
+
1
L + +
g
L
g
. The output is
(3.74)
y(t) = (L) cos(t)
=
g
j=0
j
cos
_
[t j]
_
.
The trigonometrical identity cos(AB) = cos Acos B + sin Asin B enables us
to write this as
(3.75)
y(t) =
_
j
cos(j)
_
cos(t) +
_
j
sin(j)
_
sin(t)
= cos(t) + sin(t) = cos(t ).
Here we have dened
(3.76)
=
g
j=0
j
cos(j), =
g
j=0
j
sin(j),
=
_
2
+
2
and = tan
1
_
_
.
It can be seen from (75) that the eect of the lter upon the signal is
twofold. First there is a gain eect whereby the amplitude of the sinusoid has
been increased or diminished by a factor of . Also there is a phase eect
whereby the peak of the sinusoid is displaced by a time delay of / periods.
Figures 3 and 4 represent the two eects of a simple rational transfer function
on the set of sinusoids whose frequencies range from 0 to .
50

LECTURE 4
Time-Series Analysis in
the Frequency Domain
A sequence is a function mapping from a set of integers, described as the
index set, onto the real line or into a subset thereof. A time series is a sequence
whose index corresponds to consecutive dates separated by a unit time interval.
In the statistical analysis of time series, the elements of the sequence are
regarded as a set of random variables. Usually, no notational distinction is
made between these random variables and their realised values. It is important
nevertheless to bear the distinction in mind.
In order to analyse a statistical time series, it must be assumed that the
structure of the statistical or stochastic process which generates the observa-
tions is essentially invariant through time. The conventional assumptions are
summarised in the condition of stationarity. In its strong form, the condition
requires that any two segments of equal length which are extracted from the
time series must have identical multivariate probability density functions. The
condition of weak stationarity requires only that the elements of the time series
should have a common nite expected value and that the autocovariance of two
elements should depend only on their temporal separation.
A fundamental process, from which many other stationary processes may
be derived, is the so-called white-noise process which consists of a sequence of
uncorrelated random variables, each with a zero mean and the same nite vari-
ance. By passing white noise through a linear lter, a sequence whose elements
are serially correlated can be generated. In fact, virtually every stationary
stochastic process may be depicted as the product of a ltering operation ap-
plied to white noise. This result follows from the CramerWold Theorem which
will be presented after we have introduced the concepts underlying the spectral
representation of a time series.
The spectral representation is rooted in the basic notion of Fourier analysis
which is that well-behaved functions can be approximated over a nite inter-
val, to any degree of accuracy, by a weighted combination of sine and cosine
functions whose harmonically rising frequencies are integral multiples of a fun-
damental frequency. Such linear combinations are described as Fourier sums
or Fourier series. Of course, the notion applies to sequences as well; for any
51

number of well-behaved functions may be interpolated through the coordinates
of a nite sequence.
We shall approach the Fourier analysis of stochastic processes via the ex-
act Fourier representation of a nite sequence. This is extended to provide a
representation of an innite sequence in terms of an innity of trigonometri-
cal functions whose frequencies range continuously in the interval [0, ]. The
trigonometrical functions and their weighting functions are gathered under a
FourierStieltjes integral. It is remarkable that, whereas a Fourier sum serves
only to dene a strictly periodic function, a Fourier integral suces to represent
an aperiodic time series generated by a stationary stochastic process.
The Fourier integral is also used to represent the underlying stochastic
process. This is achieved by describing the stochastic processes which generate
the weighting functions. There are two such weighting processes, associated
respectively with the sine and cosine functions; and their common variance,
which is a function f(), [0, ], is the so-called spectral density function.
The relationship between the spectral density function and the sequence
of autocovariances, which is summarised in the WienerKhintchine theorem,
provides a link between the time-domain and the frequency-domain analyses.
The sequence of autocovariances may be obtained from the Fourier transform
of the spectral density function and the spectral density function is, conversely,
a Fourier transform of the autocovariances.
Stationarity
Consider two vectors of n + 1 consecutive elements from the process y(t):
(4.1) [y
t
, y
t+1
, . . . , y
t+n
] and [y
s
, y
s+1
, . . . , y
s+n
]
Then y(t) = {y
t
; t = 0, 1, 2, . . .} is strictly stationary if the joint probability
density functions of the two vectors are the same for any values of t and s
regardless of the size of n. On the assumption that the rst and second-order
moments of the distribution are nite, the condition of stationarity implies
that all the elements of y(t) have the same expected value and that the covari-
ance between any pair of elements of the sequences is a function only of their
temporal separation. Thus,
(4.2) E(y
t
) = and C(y
t
, y
s
) =
|ts|
.
On their own, the conditions of (2) constitute the conditions of weak station-
arity.
A normal process is completely characterised by its mean and its autoco-
variances. Therefore, a normal process y(t) which satises the conditions for
weak stationarity is also stationary in the strict sense.
52

D.S.G. POLLOCK : THE FREQUENCY DOMAIN
The Autocovariance Function
The covariance between two elements y
t
and y
s
of a process y(t) which are
separated by |t s| intervals of time, is known as the autocovariance at lag
and is denoted by
. The autocorrelation at lag , denoted by
, is dened
by
(4.3)
0
,
where
0
is the variance of the process y(t).
The stationarity conditions imply that the autocovariances of y(t) satisfy
the equality
(4.4)
for all values of .

The autocovariance matrix of a stationary process corresponding to the n
elements y
0
, y
1
, . . . , y
n1
is given by
(4.5) =
_
0

1

2
. . .
n1
1

0

1
. . .
n2
2

1

0
. . .
n3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
n1

n2

n3
. . .
0
_
_
.
The sequences {
} and {
} are described as the autocovariance and autocor-

relation functions respectively.
The Filtering of White Noise
A white-noise process is a sequence (t) of uncorrelated random variables
with mean zero and common variance
2
. Thus
(4.6)
E(
t
) = 0, for all t
E(
t
s
) =
_
, if t = s;
0, if t = s.
By a process of linear ltering, a variety of time series may be constructed
whose elements display complex interdependencies. A nite linear lter, also
called a moving-average operator, is a polynomial in the lag operator of the
form (L) =
0
+
1
L+ +
q
L
q
. The eect of this lter on (t) is described
by the equation
(4.7)
y(t) = (L)(t)
=
0
(t) +
1
(t 1) +
2
(t 2) + +
q
(t q)
=
q
i=0
i
(t i).
53

The operator (L) is also be described as the transfer function which maps the
input sequence (t) into the output sequence y(t).
An operator (L) = {
0
+
1
L +
2
L
2
+ } with an indenite number
of terms in rising powers of L may also be considered. However, for this to be
practical, the coecients {
0
,
1
,
2
, . . .} must be functions of a limited number
of fundamental parameters. In addition, it is required that
(4.8)
i
|
i
| < .
Given the value of
2
= V {(t)}, the autocovariances of the ltered se-

quence y(t) = (L)(t) may be determined by evaluating the expression
(4.9)
= E(y
t
y
t
)
= E
_
ti
tj
_
=
j
E(
ti
tj
).
From equation (6), it follows that
(4.10)
=
2
j+
;
and so the variance of the ltered sequence is
(4.11)
0
=
2
2
j
.
The condition under equation (8) guarantees that these quantities are nite, as
is required by the condition of stationarity.
The z-transform
In the subsequent analysis, it will prove helpful to present the results in the
notation of the z-transform. The z-transform of the innite sequence y(t) =
{y
t
; t = 0, 1, 2, . . .} is dened by
(4.12) y(z) =
=
y
t
z
t
.
Here z is a complex number which may be placed on the perimeter of the unit
circle provided that the series converges. Thus z = e
i
with [0, 2]
54

If y(t) =
0
(t) +
1
(t 1) + +
q
z
q
(t q) = (L)(t) is a moving-
average process, then the z-transform of the sequence of moving-average coe-
cients is the polynomial (z) =
0
+
1
z + +
q
z
q
which has the same form
as the operator (L).
The z-transform of a sequence of autocovariances is called the autocovari-
ance generating function. For the moving-average process, this is given by
(4.13)
(z) =
2
(z)(z
1
)
=
2
i
z
i
j
z
j
=
2
j
z
ij
=
j+
_
z
; = i j
=
.
The nal equality is by virtue of equation (10).
The Fourier Representation of a Sequence
According to the basic result of Fourier analysis, it is always possible to
approximate an arbitrary analytic function dened over a nite interval of the
real line, to any desired degree of accuracy, by a weighted sum of sine and
cosine functions of harmonically increasing frequencies.
Similar results apply in the case of sequences, which may be regarded as
functions mapping from the set of integers onto the real line. For a sample of
T observations y
0
, . . . , y
T1
, it is possible to devise an expression in the form
(4.14) y
t
=
n
j=0
_
j
cos(
j
t) +
j
sin(
j
t)
_
,
wherein
j
= 2j/T is a multiple of the fundamental frequency
1
= 2/T.
Thus, the elements of a nite sequence can be expressed exactly in terms of
sines and cosines. This expression is called the Fourier decomposition of y
t
and
the set of coecients {
j
,
j
; j = 0, 1, . . . , n} are called the Fourier coecients.
When T is even, we have n = T/2; and it follows that
(4.15)
sin(
0
t) = sin(0) = 0,
cos(
0
t) = cos(0) = 1,
sin(
n
t) = sin(t) = 0,
cos(
n
t) = cos(t) = (1)
t
.
55

Therefore, equation (14) becomes
(4.16) y
t
=
0
+
n1
j=1
_
j
cos(
j
t) +
j
sin(
j
t)
_
+
n
(1)
t
.
When T is odd, we have n = (T 1)/2; and then equation (14) becomes
(4.17) y
t
=
0
+
n
j=1
_
j
cos(
j
t) +
j
sin(
j
t)
_
.
In both cases, there are T nonzero coecients amongst the set
{
j
,
j
; j = 0, 1, . . . , n}; and the mapping from the sample values to the co-
ecients constitutes a one-to-one invertible transformation.
In equation (16), the frequencies of the trigonometric functions range from
1
= 2/T to
n
= ; whereas, in equation (17), they range from
1
= 2/T
to
n
= (T 1)/T. The frequency is the so-called Nyquist frequency.
Although the process generating the data may contain components of fre-
quencies higher than the Nyquist frequency, these will not be detected when
it is sampled regularly at unit intervals of time. In fact, the eects on the
process of components with frequencies in excess of the Nyquist value will be
confounded with those whose frequencies fall below it.
To demonstrate this, consider the case where the process contains a com-
ponent which is a pure cosine wave of unit amplitude and zero phase whose
frequency lies in the interval < < 2. Let
= 2 . Then
(4.18)
cos(t) = cos
_
(2
)t
_
= cos(2) cos(
t) + sin(2) sin(
t)
= cos(
t);
which indicates that and
are observationally indistinguishable. Here,
< is described as the alias of > .

The Spectral Representation of a Stationary Process
By allowing the value of n in the expression (14) to tend to innity, it is
possible to express a sequence of indenite length in terms of a sum of sine and
cosine functions. However, in the limit as n , the coecients
j
,
j
tend
to vanish; and therefore an alternative representation in terms of dierentials
is called for.
By writing
j
= dA(
j
),
j
= dB(
j
) where A(), B() are step functions
with discontinuities at the points {
j
; j = 0, . . . , n}, the expression (14) can be
rendered as
(4.19) y
t
=
j
_
cos(
j
t)dA(
j
) + sin(
j
t)dB(
j
)
_
.
56

0.0
0.2
0.4
0.6
0.0
0.2
0.4
0 25 50 75 100 125
0.000
0.025
0.050
0.075
0.100
0.125
0 /4 /2 3/4
Figure 1. The graph of 134 observations on the monthly purchase of
clothing after a logarithmic transformation and the removal of a linear trend
together with the corresponding periodogram.
57

In the limit, as n , the summation is replaced by an integral to give the
expression
(4.20) y(t) =
_

0
_
cos(t)dA() + sin(t)dB()
_
.
Here, cos(t) and sin(t), and therefore y(t), may be regarded as innite se-
quences dened over the entire set of positive and negative integers.
Since A() and B() are discontinuous functions for which no derivatives
exist, one must avoid using ()d and ()d in place of dA() and dB().
Moreover, the integral in equation (20) is a FourierStieltjes integral.
In order to derive a statistical theory for the process that generates y(t),
one must make some assumptions concerning the functions A() and B().
So far, the sequence y(t) has been interpreted as a realisation of a stochastic
process. If y(t) is regarded as the stochastic process itself, then the functions
A(), B() must, likewise, be regarded as stochastic processes dened over
the interval [0, ]. A single realisation of these processes now corresponds to a
single realisation of the process y(t).
The rst assumption to be made is that the functions A() and B()
represent a pair of stochastic processes of zero mean which are indexed on the
continuous parameter . Thus
(4.21) E
_
dA()
_
= E
_
dB()
_
= 0.
The second and third assumptions are that the two processes are mutu-
ally uncorrelated and that non-overlapping increments within each process are
uncorrelated. Thus
(4.22)
E
_
dA()dB()
_
= 0 for all , ,
E
_
dA()dA()
_
= 0 if = ,
E
_
dB()dB()
_
= 0 if = .
The nal assumption is that the variance of the increments is given by
(4.23)
V
_
dA()
_
= V
_
dB()
_
= 2dF()
= 2f()d.
We can see that, unlike A() and B(), F() is a continuous dierentiable
function. The function F() and its derivative f() are the spectral distribu-
tion function and the spectral density function, respectively.
In order to express equation (20) in terms of complex exponentials, we
may dene a pair of conjugate complex stochastic processes:
(4.24)
dZ() =
1
2
_
dA() idB()
_
,
dZ
() =
1
2
_
dA() +idB()
_
.
58

Also, we may extend the domain of the functions A(), B() from [0, ] to
[, ] by regarding A() as an even function such that A() = A() and by
regarding B() as an odd function such that B() = B(). Then we have
(4.25) dZ
() = dZ().
From conditions under (22), it follows that
(4.26)
E
_
dZ()dZ
()
_
= 0 if = ,
E{dZ()dZ
()} = f()d.
These results may be used to reexpress equation (20) as
(4.27)
y(t) =
_

0
_
(e
it
+e
it
)
2
dA() i
(e
it
e
it
)
2
dB()
_
=
_

0
_
e
it
{dA() idB()}
2
+e
it
{dA() +idB()}
2
_
=
_

0
_
e
it
dZ() +e
it
dZ
()
_
.
When the integral is extended over the range [, ], this becomes
(4.28) y(t) =
_

e
it
dZ().
This is commonly described as the spectral representation of the process y(t).
The Autocovariances and the Spectral Density Function
The sequence of the autocovariances of the process y(t) may be expressed
in terms of the spectrum of the process. From equation (28), it follows that
the autocovariance y
t
at lag = t k is given by
(4.29)
= C(y
t
, y
k
) = E
__
e
it
dZ()
_
e
ik
dZ()
_
=
_
e
it
e
ik
E{dZ()dZ
()}
=
_
e
i
E{dZ()dZ
()}
=
_
e
i
f()d.
59

0.00
0.25
0.50
0.75
1.00
0.25
0.50
0.75
0 5 10 15 20 25
0
10
20
30
40
0 /4 /2 3/4
Figure 2. The theoretical autocorrelation function of the ARMA(2, 2)
process (1 1.344L + 0.902L
2
)y(t) = (1 1.691L + 0.810L
2
)(t) and
(below) the corresponding spectral density function.
60

Here the nal equalities are derived by using the results (25) and (26). This
equation indicates that the Fourier transform of the spectrum is the autoco-
variance function.
The inverse mapping from the autocovariances to the spectrum is given by
(4.30)
f() =
1
2
e
i
=
1
2
_
0
+ 2
=1
cos()
_
.
This function is directly comparable to the periodogram of a data sequence
which is dened under (2.41). However, the periodogram has T empirical auto-
covariances c
0
, . . . , c
T1
in place of an indenite number of theoretical autoco-
variances. Also, it diers from the spectrum by a scalar factor of 4. In many
texts, equation (30) serves as the primary denition of the spectrum.
To demonstrate the relationship which exists between equations (29) and
(30), we may substitute the latter into the former to give
(4.31)
=
_

e
i
_
1
2
e
i
_
d
=
1
2
e
i()
d.
From the fact that
(4.32)
_

e
i()
d =
_
2, if = ;
0, if = ,
it can be seen that the RHS of the equation reduces to
. This serves to show

that equations (29) and (30) do indeed represent a Fourier transform and its
inverse.
The essential interpretation of the spectral density function is indicated by
the equation
(4.33)
0
=
_
f()d,
which comes from setting = 0 in equation (29). This equation shows how the
variance or power of y(t), which is
0
, is attributed to the cyclical components
of which the process is composed.
61

It is easy to see that a at spectrum corresponds to the autocovariance
function which characterises a white-noise process (t). Let f
= f
() be the
at spectrum. Then, from equation (30), it follows that
(4.34)
0
=
_

()d
= 2f
,
and, from equation (29), it follows that
(4.35)
=
_

()e
i
d
= f
e
i
d
= 0.
These are the same as the conditions under (6) which have served to dene a
white-noise process. When the variance is denoted by
2
, the expression for

the spectrum of the white-noise process becomes
(4.36) f
() =

2
2
.
Canonical Factorisation of the Spectral Density Function
Let y(t) be a stationary stochastic process whose spectrum is f
y
(). Since
f
y
() 0, it is always possible to nd a complex function () such that
(4.37) f
y
() =
1
2
()
().
For a wide class of stochastic processes, the function () may be constructed
in such a way that it can be expanded as a one-sided Fourier series:
(4.38) () =
j=0
j
e
ij
.
On dening
(4.39) dZ
() =
dZ
y
()
()
,
62

the spectral representation of the process y(t) given in equation (28), may be
rewritten as
(4.40) y(t) =
_
e
it
()dZ
().
Expanding the expression of () and interchanging the order of integra-
tion and summation gives
(4.41)
y(t) =
_
e
it
_
j
e
ij
_
dZ
()
=
j
__
e
i(tj)
dZ
()
_
=
j
(t j),
where we have dened
(4.42) (t) =
_
e
it
dZ
().
The spectrum of (t) is given by
(4.43)
E{dZ
()dZ
()} = E
_
dZ
y
()dZ
y
()
()
()
_
=
f
y
()
()
()
=
1
2
.
Hence (t) is identied as a white-noise process with unit variance. Therefore
equation (38) represents a moving-average process; and what our analysis im-
plies is that virtually every stationary stochastic process can be represented in
this way.
The Frequency-Domain Analysis of Filtering
It is a straightforward matter to derive the spectrum of a process y(t) =
(L)x(t) which is formed by mapping the process x(t) through a linear lter.
Taking the spectral representation of the process x(t) to be
(4.44) x(t) =
_
e
it
dZ
x
(),
63

we have
(4.45)
y(t) =
j
x(t j)
=
j
__
e
i(tj)
dZ
x
()
_
=
_
e
it
_
j
e
ij
_
dZ
x
().
On writing
j
e
ij
= (), this becomes
(4.46)
y(t) =
_
e
it
()dZ
x
()
=
_
e
it
dZ
y
().
It follows that the spectral density function f
y
() of the ltered process y(t) is
given by
(4.47)
f
y
()d = E{dZ
y
()dZ
y
()}
= ()
()E{dZ
x
()dZ
x
()}
= |()|
2
f
x
()d.
In the case of the process dened in equation (7), where y(t) is obtained by
ltering a white-noise sequence, the result is specialised to give
(4.48)
f
y
() = |()|
2
f
()
=

2
2
|()|
2
.
Let (z) =
j
z
j
denote the z-transform of the sequence {
j
}. Then
(4.49)
|(z)|
2
= (z)(z
1
)
=
j+
z
.
It follows that, when z = e
i
, equation (48) can be written as
(4.50)
f
y
() =

2
2
(z)(z
1
)
=
1
2
j+
_
z
.
64

But, according to equation (10),
=
2
j

j
j+
is the autocovariance of
lag of the process y(t). Therefore, the function f
y
() can be written as
(4.51)
f
y
() =
1
2
=
e
i
=
1
2
_
0
+ 2
=1
cos()
_
,
which indicates that the spectral density function is the Fourier transform of the
autocovariance function of the ltered sequence. This is known as the Wiener
Khintchine theorem. The importance of this theorem is that it provides a link
between the time domain and the frequency domain.
The Gain and Phase
The complex-valued function (), which is entailed in the process of linear
ltering, can be written as
(4.52) () = |()|e
i()
.
where
(4.53)
|()|
2
=
_

j=0
j
cos(j)
_
2
+
_

j=0
j
sin(j)
_
2
() = arctan
_
j
sin(j)
j
cos(j)
_
.
The function |()|, which is described as the gain of the lter, indicates
the extent to which the amplitude of the cyclical components of which x(t) is
composed are altered in the process of ltering.
The function (), which is described as the phase displacement and which
gives a measure in radians, indicates the extent to which the cyclical compo-
nents are displaced along the time axis.
The substitution of expression (52) in equation (46) gives
(4.54) y(t) =
_

e
i{t()}
|()|dZ
x
().
The importance of this equation is that it summarises the two eects of the
lter.
65

LECTURE 5
Linear Stochastic Models
Autcovariances of a Stationary Process
A temporal stochastic process is simply a sequence of random variables
indexed by a time subscript. Such a process can be denoted by x(t). The
element of the sequence at the point t = is x
= x().
Let {x
+1
, x
+2
, . . . , x
+n
} denote n consecutive elements of the sequence.
Then the process is said to be strictly stationary if the joint probability distri-
bution of the elements does not depend on regardless of the size of n. This
means that any two segments of the sequence of equal length have identical
probability density functions. In consequence, the decision on where to place
the time origin is arbitrary; and the argument can be omitted. Some further
implications of stationarity are that
(5.1) E(x
t
) = < for all t and C(x
+t
, x
+s
) =
|ts|
.
The latter condition means that the covariance of any two elements depends
only on their temporal separation |t s|. Notice that, if the elements of the
sequence are normally distributed, then the two conditions are sucient to
establish strict stationarity. On their own, they constitute the conditions of
weak or 2nd-order stationarity.
The condition on the covariances implies that the dispersion matrix of the
vector [x
1
, x
2
, . . . , x
n
] is a bisymmetric Laurent matrix of the form
(5.2) =
_
0

1

2
. . .
n1
1

0

1
. . .
n2
2

1

0
. . .
n3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
n1

n2

n3
. . .
0
_
_
,
wherein the generic element in the (i, j)th position is
|ij|
= C(x
i
, x
j
). Given
that a sequence of observations of a time series represents only a segment of
a single realisation of a stochastic process, one might imagine that there is
little chance of making valid inferences about the parameters of the process.
66

D.S.G. POLLOCK : LINEAR STOCHASTIC MODELS
However, provided that the process x(t) is stationary and provided that the
statistical dependencies between widely separated elements of the sequence are
weak, it is possible to estimate consistently those parameters of the process
which express the dependence of proximate elements of the sequence. If one
is prepared to make suciently strong assumptions about the nature of the
process, then a knowledge of such parameters may be all that is needed for a
complete characterisation of the process.
Moving-Average Processes
The qth-order moving average process, or MA(q) process, is dened by the
equation
(5.3) y(t) =
0
(t) +
1
(t 1) + +
q
(t q),
where (t), which has E{(t)} = 0, is a white-noise process consisting of a
sequence of independently and identically distributed random variables with
zero expectations. The equation is normalised either by setting
0
= 1 or by
setting V {(t)} =
2
= 1. The equation can be written in summary notation

as y(t) = (L)(t), where (L) =
0
+
1
L+ +
q
L
q
is a polynomial in the
lag operator.
A moving-average process is clearly stationary since any two elements
y
t
and y
s
represent the same function of the vectors [
t
,
t1
, . . . ,
tq
] and
[
s
,
s1
, . . . ,
sq
] which are identically distributed. In addition to the condi-
tion of stationarity, it is usually required that a moving-average process should
be invertible such that it can be expressed in the form of
1
(L)y(t) = (t)
where the LHS embodies a convergent sum of past values of y(t). This is an
innite-order autoregressive representation of the process. The representation
is available only if all the roots of the equation (z) =
0
+
1
z + +
q
z
q
= 0
lie outside the unit circle. This conclusion follows from our discussion of partial
fractions.
As an example, let us consider the rst-order moving-average process which
is dened by
(5.4) y(t) = (t) (t 1) = (1 L)(t).
Provided that || < 1, this can be written in autoregressive form as
(5.5)
(t) = (1 L)
1
y(t)
=
_
y(t) + y(t 1) +
2
y(t 2) +
_
.
Imagine that || > 1 instead. Then, to obtain a convergent series, we have to
write
(5.6)
y(t + 1) = (t + 1) (t)
= (1 L
1
/)(t),
67

where L
1
(t) = (t + 1). This gives
(5.7)
(t) =
1
(1 L
1
/)
1
y(t + 1)
=
1
_
y(t + 1)/ + y(t + 2)/
2
+ y(t 3)/
3
+
_
.
Normally, an expression such as this, which embodies future values of y(t),
would have no reasonable meaning.
It is straightforward to generate the sequence of autocovariances from a
knowledge of the parameters of the moving-average process and of the variance
of the white-noise process. Consider
(5.8)
= E(y
t
y
t
)
= E
_
ti
tj
_
=
j
E(
ti
tj
).
Since (t) is a sequence of independently and identically distributed random
variables with zero expectations, it follows that
(5.9) E(
ti
tj
) =
_
0, if i = + j;
, if i = + j.
Therefore
(5.10)
=
2
j+
.
Now let = 0, 1, . . . , q. This gives
(5.11)
0
=
2
(
2
0
+
2
1
+ +
2
q
),
1
=
2
(
0
1
+
1
2
+ +
q1
q
),
.
.
.
q
=
2
q
.
Also,
= 0 for all > q.

The rst-order moving-average process y(t) = (t) (t 1) has the
following autocovariances:
(5.12)
0
=
2
(1 +
2
),
1
=
2
= 0 if > 1.
68

Thus, for a vector y = [y
1
, y
2
, . . . , y
T
]
of T consecutive elements from a rst-

order moving-average process, the dispersion matrix is
(5.13) D(y) =
2
_
1 +
2
0 . . . 0
1 +
2
. . . 0
0 1 +
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 . . . 1 +
2
_
_
.
In general, the dispersion matrix of a qth-order moving-average process has q
subdiagonal and q supradiagonal bands of nonzero elements and zero elements
elsewhere.
It is also helpful to dene an autocovariance generating function which is a
power series whose coecients are the autocovariances
for successive values

of . This is denoted by
(5.14) (z) =
; with = {0, 1, 2, . . .} and
.
The generating function is also called the z-transform of the autocovariance
function.
The autocovariance generating function of the qth-order moving-average
process can be found quite readily. Consider the convolution
(5.15)
(z)(z
1
) =
i
z
i
j
z
j
=
j
z
ij
=
j+
_
z
, = i j.
By referring to the expression for the autocovariance of lag of a moving-
average process given under (10), it can be seen that the autocovariance gen-
erating function is just
(5.16) (z) =
2
(z)(z
1
).
Autoregressive Processes
The pth-order autoregressive process, or AR(p) process, is dened by the
equation
(5.17)
0
y(t) +
1
y(t 1) + +
p
y(t p) = (t).
69

This equation is invariably normalised by setting
0
= 1, although it would
be possible to set
2
= 1 instead. The equation can be written in summary

notation as (L)y(t) = (t), where (L) =
0
+
1
L + +
p
L
p
. For the
process to be stationary, the roots of the equation (z) =
0
+
1
z + +
p
z
p
= 0 must lie outside the unit circle. This condition enables us to write
the autoregressive process as an innite-order moving-average process in the
form of y(t) =
1
(L)(t).
As an example, let us consider the rst-order autoregressive process which
is dened by
(5.18)
(t) = y(t) y(t 1)
= (1 L)y(t).
Provided that the process is stationary with || < 1, it can be represented in
moving-average form as
(5.19)
y(t) = (1 L)
1
(t)
=
_
(t) + (t 1) +
2
(t 2) +
_
.
The autocovariances of the process can be found by using the formula of (10)
which is applicable to moving-average process of nite or innite order. Thus
(5.20)
= E(y
t
y
t
)
= E
_
ti
tj
_
=
j
E(
ti
tj
);
and the result under (9) indicates that
(5.21)
=
2
j+
=

2
1
2
.
For a vector y = [y
1
, y
2
, . . . , y
T
]
of T consecutive elements from a rst-order

autoregressive process, the dispersion matrix has the form
(5.22) D(y) =

2
1
2
_
_
1
2
. . .
T1
1 . . .
T2
2
1 . . .
T3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
T1
T2
T3
. . . 1
_
_
.
70

To nd the autocovariance generating function for the general pth-order
autoregressive process, we may consider again the function (z) =
i

i
z
i
.
Since an autoregressive process may be treated as an innite-order moving-
average process, it follows that
(5.23) (z) =

2
(z)(z
1
)
.
For an alternative way of nding the autocovariances of the pth-order process,
consider multiplying
i

i
y
ti
=
t
by y
t
and taking expectations to give
(5.24)
i
E(y
ti
y
t
) = E(
t
y
t
).
Taking account of the normalisation
0
= 1, we nd that
(5.25) E(
t
y
t
) =
_
, if = 0;
0, if > 0.
Therefore, on setting E(y
ti
y
t
) =
i
, equation (24) gives
(5.26)
i
=
_
, if = 0;
0, if > 0.
The second of these is a homogeneous dierence equation which enables us to
generate the sequence {
p
,
p+1
, . . .} once p starting values
0
,
1
, . . . ,
p1
are
known. By letting = 0, 1, . . . , p in (26), we generate a set of p + 1 equations
which can be arrayed in matrix form as follows:
(5.27)
_
0

1

2
. . .
p
1

0

1
. . .
p1
2

1

0
. . .
p2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
p

p1

p2
. . .
0
_
_
_
_
1
2
.
.
.
p
_
_
=
_
0
0
.
.
.
0
_
_
.
These are called the YuleWalker equations, and they can be used either for
generating the values
0
,
1
, . . . ,
p
from the values
1
, . . . ,
p
,
2
or vice versa.
For an example of the two uses of the YuleWalker equations, let us con-
sider the second-order autoregressive process. In that case, we have
(5.28)
_
_
0

1

2
1

0

1
2

1

0
_
_
_
_
2
_
_
=
_
_
2

1

0
0 0
0
2

1

0
0
0 0
2

1

0
_
_
_
2
_
_
=
_
_
0

1

2
1

0
+
2
0
2

1

0
_
_
_
_
2
_
_
=
_
_
0
0
_
_
.
71

Given
0
= 1 and the values for
0
,
1
,
2
, we can nd
2
and
1
,
2
. Con-
versely, given
0
,
1
,
2
and
2
, we can nd
0
,
1
,
2
. It is worth recalling at
this juncture that the normalisation
2
= 1 might have been chosen instead

of
0
= 1. This would have rendered the equations more easily intelligible.
Notice also how the matrix following the rst equality is folded across the axis
which divides it vertically to give the matrix which follows the second equality.
Pleasing eects of this sort often arise in time-series analysis.
The Partial Autocorrelation Function
Let
r(r)
be the coecient associated with y(t r) in an autoregres-
sive process of order r whose parameters correspond to the autocovariances
0
,
1
, . . . ,
r
. Then the sequence {
r(r)
; r = 1, 2, . . .} of such coecients, whose
index corresponds to models of increasing orders, constitutes the partial auto-
correlation function. In eect,
r(r)
indicates the role in explaining the variance
of y(t) which is due to y(t r) when y(t 1), . . . , y(t r + 1) are also taken
into account.
Much of the theoretical importance of the partial autocorrelation function
is due to the fact that, when
0
is added, it represents an alternative way of
conveying the information which is present in the sequence of autocorrelations.
Its role in identifying the order of an autoregressive process is evident; for, if
r(r)
= 0 and if
p(p)
= 0 for all p > r, then it is clearly implied that the
process has an order of r.
The sequence of partial autocorrelations may be computed eciently via
the recursive DurbinLevinson Algorithm which uses the coecients of the AR
model of order r as the basis for calculating the coecients of the model of
order r + 1.
To derive the algorithm, let us imagine that we already have the values
0(r)
= 1,
1(r)
, . . . ,
r(r)
. Then, by extending the set of rth-order YuleWalker
equations to which these values correspond, we can derive the system
(5.29)
_
0

1
. . .
r

r+1
1

0
. . .
r1

r
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
r

r1
. . .
0

1
r+1

r
. . .
1

0
_
_
_
_
1
1(r)
.
.
.
r(r)
0
_
_
=
_
2
(r)
0
.
.
.
0
g
_
_
,
wherein
(5.30) g =
r
j=0
j(r)
r+1j
with
0(r)
= 1.
72

The system can also be written as
(5.31)
_
0

1
. . .
r

r+1
1

0
. . .
r1

r
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
r

r1
. . .
0

1
r+1

r
. . .
1

0
_
_
_
_
0
r(r)
.
.
.
1(r)
1
_
_
=
_
_
g
0
.
.
.
0
2
(r)
_
_
.
The two systems of equations (29) and (31) can be combined to give
(5.32)
_
0

1
. . .
r

r+1
1

0
. . .
r1

r
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
r

r1
. . .
0

1
r+1

r
. . .
1

0
_
_
_
_
1
1(r)
+ c
r(r)
.
.
.
r(r)
+ c
1(r)
c
_
_
=
_
2
(r)
+ cg
0
.
.
.
0
g + c
2
(r)
_
_
.
If we take the coecient of the combination to be
(5.33) c =
g
2
(r)
,
then the nal element in the vector on the RHS becomes zero and the system
becomes the set of YuleWalker equations of order r + 1. The solution of the
equations, from the last element
r+1(r+1)
= c through to the variance term
2
(r+1)
is given by
(5.34)
r+1(r+1)
=
1
2
(r)
_
r
j=0
j(r)
r+1j
_
_
1(r+1)
.
.
.
r(r+1)
_
_
=
_
1(r)
.
.
.
r(r)
_
_
+
r+1(r+1)
_
r(r)
.
.
.
1(r)
_
2
(r+1)
=
2
(r)
_
1 (
r+1(r+1)
)
2
_
.
Thus the solution of the YuleWalker system of order r + 1 is easily derived
from the solution of the system of order r, and there is scope for devising a
recursive procedure. The starting values for the recursion are
(5.35)
1(1)
=
1
/
0
and
2
(1)
=
0
_
1 (
1(1)
)
2
_
.
73

Autoregressive Moving Average Processes
The autoregressive moving-average process of orders p and q, which is
referred to as the ARMA(p, q) process, is dened by the equation
(5.36)
0
y(t) +
1
y(t 1) + +
p
y(t p)
=
0
(t) +
1
(t 1) + +
q
(t q).
The equation is normalised by setting
0
= 1 and by setting either
0
= 1
or
2
= 1. A more summary expression for the equation is (L)y(t) = (L)(t).

Provided that the roots of the equation (z) = 0 lie outside the unit circle,
the process can be represented by the equation y(t) =
1
(L)(L)(t) which
corresponds to an innite-order moving-average process. Conversely, provided
the roots of the equation (z) = 0 lie outside the unit circle, the process can
be represented by the equation
1
(L)(L)y(t) = (t) which corresponds to an
innite-order autoregressive process.
By considering the moving-average form of the process, and by noting the
form of the autocovariance generating function for such a process which is given
by equation (16), it can be seen that the autocovariance generating function
for the autoregressive moving-average process is
(5.37) (z) =
2
(z)(z
1
)
(z)(z
1
)
.
This generating function, which is of some theoretical interest, does not
provide a practical means of nding the autocovariances. To nd these, let us
consider multiplying the equation
i

i
y
ti
=
i

i
ti
by y
t
and taking
expectations. This gives
(5.38)
i
=
i
,
where
i
= E(y
t
y
ti
) and
i
= E(y
t
ti
). Since
ti
is uncorrelated
with y
t
whenever it is subsequent to the latter, it follows that
i
= 0 if
> i. Since the index i in the RHS of the equation (38) runs from 0 to q, it
follows that
(5.39)
i
= 0 if > q.
Given the q +1 nonzero values
0
,
1
, . . . ,
q
, and p initial values
0
,
1
, . . . ,
p1
for the autocovariances, the equations can be solved recursively to obtain the
subsequent values {
p
,
p+1
, . . .}.
74

To nd the requisite values
0
,
1
, . . . ,
q
, consider multiplying the equation
i

i
y
ti
=
i

i
ti
by
t
and taking expectations. This gives
(5.40)
i
=
,
where
i
= E(y
ti
t
). The equation may be rewritten as
(5.41)
=
1
0
_
i=1
i
_
,
and, by setting = 0, 1, . . . , q, we can generate recursively the required values
0
,
1
, . . . ,
q
.
Example. Consider the ARMA(2, 2) model which gives the equation
(5.42)
0
y
t
+
1
y
t1
+
2
y
t2
=
0
t
+
1
t1
+
2
t2
.
Multiplying by y
t
, y
t1
and y
t2
and taking expectations gives
(5.43)
_
_
0

1

2
1

0

1
2

1

0
_
_
_
_
2
_
_
=
_
_
0

1

2
0
0

1
0 0
0
_
_
_
_
2
_
_
.
Multiplying by
t
,
t1
and
t2
and taking expectations gives
(5.44)
_
_
0
0 0
1

0
0
2

1

0
_
_
_
_
2
_
_
=
_
_
0 0
0
2
0
0 0
2
_
_
_
_
2
_
_
.
When the latter equations are written as
(5.45)
_
_
0
0 0
1

0
0
2

1

0
_
_
_
_
2
_
_
=
2
_
_
2
_
_
,
they can be solved recursively for
0
,
1
and
2
on the assumption that that
the values of
0
,
1
,
2
and
2
are known. Notice that, when we adopt the

normalisation
0
=
0
= 1, we get
0
=
2
. When the equations (43) are

rewritten as
(5.46)
_
_
0

1

2
1

0
+
2
0
2

1

0
_
_
_
_
2
_
_
=
_
_
0

1

2
1

2
0
2
0 0
_
_
_
_
2
_
_
,
they can be solved for
0
,
1
and
2
. Thus the starting values are obtained
which enable the equation
(5.47)
0
+
1
1
+
2
2
= 0; > 2
to be solved recursively to generate the succeeding values {
3
,
4
, . . .} of the
autocovariances.
75

by
D.S.G. Pollock
The methods to be presented in this lecture are designed for the purpose of
analysing series of statistical observations taken at regular intervals in time.
The methods have a wide range of applications. We can cite astronomy [18],
meteorology [9], seismology [21], oceanography [11], communications engineer-
ing and signal processing [16], the control of continuous process plants [20],
neurology and electroencephalography [1], [25], and economics [10]; and this
list is by no means complete.
1. The Frequency Domain and the Time Domain
The methods apply, in the main, to what are described as stationary or
non-evolutionary time series. Such series manifest statistical properties which
are invariant throughout time, so that the behaviour during one epoch is the
same as it would be during any other.
When we speak of a weakly stationary or covariance-stationary process,
we have in mind a sequence of random variables y(t) = {y
t
; t = 0, 1, 2, . . .},
representing the potential observations of the process, which have a common
nite expected value E(x
t
) = and a set of autocovariances C(y
t
, y
s
) = E{(y
t
)(y
s
)} =
|ts|
which depend only on the temporal separation = |ts| of
the dates t and s and not on their absolute values. We also commonly require
of such a process that lim( )
= 0 which is to say that the correlation

between increasingly remote elements of the sequence tends to zero. This is
a way of expressing the notion that the events of the past have a diminishing
eect upon the present as they recede in time. In an appendix to the paper,
we review the denitions of mathematical expectations and covariances.
ysis which may be pursued. On the one hand are the time-domain methods
which have their origin in the classical theory of correlation. Such methods
deal preponderantly with the autocovariance functions and the cross-covariance
functions of the series, and they lead inevitably towards the construction of
structural or parametric models of the autoregressive moving-average type for
1

single series and of the transfer-function type for two or more causally related
series. Many of the methods which are used to estimate the parameters of
these models can be viewed as sophisticated variants of the method of linear
regression.
On the other hand are the frequency-domain methods of spectral analysis.
These are based on an extension of the methods of Fourier analysis which
originate in the idea that, over a nite interval, any analytic function can be
approximated, to whatever degree of accuracy is desired, by taking a weighted
sum of sine and cosine functions of harmonically increasing frequencies.
2. Harmonic Analysis
The astronomers are usually given credit for being the rst to apply the
methods of Fourier analysis to time series. Their endeavours could be described
as the search for hidden periodicities within astronomical data. Typical exam-
ples were the attempts to uncover periodicities within the activities recorded
by the Wolfer sunspot index and in the indices of luminosity of variable stars.
The relevant methods were developed over a long period of time. Lagrange
[13] suggested methods for detecting hidden periodicities in 1772 and 1778.
The Dutchman Buys-Ballot [6] propounded eective computational procedures
for the statistical analysis of astronomical data in 1847. However, we should
probably credit Sir Arthur Schuster [17], who in 1889 propounded the technique
of periodogram analysis, with being the progenitor of the modern methods for
analysing time series in the frequency domain.
In essence, these frequency-domain methods envisaged a model underlying
the observations which takes the form of
(1)
y(t) =
j
cos(
j
t
j
) + (t)
=
j
_
j
cos(
j
t) +
j
sin(
j
t)
_
+ (t),
where
j
=
j
cos
j
and
j
=
j
sin
j
, and where (t) is a sequence of indepen-
dently and identically distributed random variables which we call a white-noise
process. Thus the model depicts the series y(t) as a weighted sum of perfectly
regular periodic components upon which is superimposed a random component.
The factor
j
=

(
2
j
+
2
j
) is called the amplitude of the jth periodic
component, and it indicates the importance of that component within the sum.
Since the variance of a cosine function, which is also called its mean-square
deviation, is just one half, and since cosine functions at dierent frequencies
are uncorrelated, it follows that the variance of y(t) is expressible as V {y(t)} =
1
2
j

2
j
+
2
where
2
= V {(t)} is the variance of the noise.

The periodogram is simply a device for determining how much of the vari-
ance of y(t) is attributable to any given harmonic component. Its value at
2

D.S.G. POLLOCK : THE METHODS OF TIME-SERIES ANALYSIS
j
= 2j/T, calculated from a sample y
0
, . . . , y
T1
comprising T observations
on y(t), is given by
(2)
I(
j
) =
2
T
_
_
t
y
t
cos(
j
)
_
2
+
_
t
y
t
sin(
j
)
_
2
_
=
T
2
_
a
2
(
j
) + b
2
(
j
)
_
.
If y(t) does indeed comprise only a nite number of well-dened harmonic
components, then it can be shown that 2I(
j
)/T is a consistent estimator of
2
j
in the sense that it converges to the latter in probability as the size T of the
sample of the observations on y(t) increases.
0 10 20 30 40 50 60 70 80 90
Figure 1. The graph of a sine function.
0 10 20 30 40 50 60 70 80 90
Figure 2. Graph of a sine function with small random uctuations superimposed.
3

The process by which the ordinates of the periodogram converge upon the
squared values of the harmonic amplitudes was well expressed by Yule [24] in
a seminal article of 1927:
If we take a curve representing a simple harmonic function of time,
and superpose on the ordinates small random errors, the only eect is
to make the graph somewhat irregular, leaving the suggestion of peri-
odicity still clear to the eye. If the errors are increased in magnitude,
the graph becomes more irregular, the suggestion of periodicity more
obscure, and we have only suciently to increase the errors to mask
completely any appearance of periodicty. But, however large the er-
rors, periodogram analysis is applicable to such a curve, and, given
a sucient number of periods, should yield a close approximation to
the period and amplitude of the underlying harmonic function.
We should not quote this passage without mentioning that Yule proceeded
to question whether the hypothesis underlying periodogram analysis, which
postulates the equation under (1), was an appropriate hypothesis for all cases.
0
50
100
150
1750 1760 1770 1780 1790 1800 1810 1820 1830
0
50
100
150
1840 1850 1860 1870 1880 1890 1900 1910 1920
Figure 3. Wolfers Sunspot Numbers 17491924.
4

A highly successful application of periodogram analysis was that of Whit-
taker and Robinson [22] who, in 1924, showed that the series recording the
brightness or magnitude of the star T. Ursa Major over 600 days could be t-
ted almost exactly by the sum of two harmonic functions with periods of 24 and
29 days. This led to the suggestion that what was being observed was actu-
ally a two-star system wherein the larger star periodically masked the smaller
brighter star. Somewhat less successful were the attempts of Arthur Schuster
himself [18] in 1906 to substantiate the claim that there is an eleven-year cycle
in the activity recorded by the Wolfer sunspot index.
Other applications of the method of periodogram analysis were even less
successful; and one application which was a signicant failure was its use by
William Beveridge [2, 3] in 1921 and 1922 to analyse a long series of European
wheat prices. The periodogram of this data had so many peaks that at least
twenty possible hidden periodicities could be picked out, and this seemed to be
many more than could be accounted for by plausible explanations within the
realm of economic history. Such experiences seemed to point to the inappro-
priateness to economic circumstances of a model containing perfectly regular
cycles. A classic expression of disbelief was made by Slutsky [19] in another
article of 1927:
Suppose we are inclined to believe in the reality of the strict periodicity
of the business cycle, such, for example, as the eight-year period pos-
tulated by Moore [14]. Then we should encounter another diculty.
Wherein lies the source of this regularity? What is the mechanism of
causality which, decade after decade, reproduces the same sinusoidal
wave which rises and falls on the surface of the social ocean with the
regularity of day and night?
3. Autoregressive and Moving-Average Models
The next major episode in the history of the development of time-series
analysis took place in the time domain, and it began with the two articles of
1927 by Yule [24] and Slutsky [19] from which we have already quoted. In both
articles, we nd a rejection of the model with deterministic harmonic compo-
nents in favour of models more rmly rooted in the notion of random causes. In
a wonderfully gurative exposition, Yule invited his readers to imagine a pen-
dulum attached to a recording device and left to swing. Then any deviations
from perfectly harmonic motion which might be recorded must be the result
of errors of observation which could be all but eliminated if a long sequence
of observations were subjected to a periodogram analysis. Next, Yule enjoined
the reader to imagine that the regular swing of the pendulum is interrupted by
small boys who get into the room and start pelting the pendulum with peas
sometimes from one side and sometimes from the other. The motion is now
aected not by superposed uctuations but by true disturbances.
5

In this example, Yule contrives a perfect analogy for the autoregressive
time-series model. To explain the analogy, let us begin by considering a homo-
geneous second-order dierence equation of the form
(3) y(t) =
1
y(t 1) +
2
y(t 2).
Given the initial values y
1
and y
2
, this equation can be used recursively to
generate an ensuing sequence {y
0
, y
1
, . . .}. This sequence will show a regular
pattern of behaviour whose nature depends on the parameters
1
and
2
. If
these parameters are such that the roots of the quadratic equation z
2
1
z
2
= 0 are complex and less than unity in modulus, then the sequence of values
will show a damped sinusoidal behaviour just as a clock pendulum will which
is left to swing without the assistance of the falling weights. In fact, in such a
case, the general solution to the dierence equation will take the form of
(4) y(t) =
t
cos(t ),
where the modulus , which has a value between 0 and 1, is now the damping
factor which is responsible for the attenuation of the swing as the time t elapses.
The autoregressive model which Yule was proposing takes the form of
(5) y(t) =
1
y(t 1) +
2
y(t 2) + (t),
where (t) is, once more, a white-noise sequence. Now, instead of masking
the regular periodicity of the pendulum, the white noise has actually become
the engine which drives the pendulum by striking it randomly in one direction
and another. Its haphazard inuence has replaced the steady force of the
falling weights. Nevertheless, the pendulum will still manifest a deceptively
regular motion which is liable, if the sequence of observations is short and
contains insucient contrary evidence, to be misinterpreted as the eect of an
underlying mechanism.
In his article of 1927, Yule attempted to explain the Wolfer index in terms
of the second-order autoregressive model of equation (5). From the empirical
autocovariances of the sample represented in Figure 3, he estimated the val-
ues
1
= 1.343 and
2
= 0.655. The general solution of the corresponding
homogeneous dierence equation has a damping factor of = 0.809 and an
angular velocity of = 33.96
o
The angular velocity indicates a period of 10.6
years which is a little shorter than the 11-year period obtained by Schuster
in his periodogram analysis of the same data. In Figure 4, we show a series
which has been generated articially from the Yules equation together with a
series generated by the equation y(t) = 1.576y(t 1) 0.903y(t 2) + (t).
The homogeneous dierence equation which corresponds to the latter has the
same value of as before. Its damping factor has the value = 0.95, and this
increase accounts for the greater regularity of the second series.
6

0 10 20 30 40 50 60 70 80 90
Figure 4. A series generated by Yules equation
y(t) = 1.343y(t 1) 0.655y(t 2) + (t).
0 10 20 30 40 50 60 70 80 90
Figure 5. A series generated by the equation
y(t) = 1.576y(t 1) 0.903y(t 2) + (t).
Neither of our two series accurately mimics the sunspot index; although
the second series seems closer to it than the series generated by Yules equation.
An obvious feature of the sunspot index which is not shared by the articial
series is the fact that the numbers are constrained to be nonnegative. To relieve
this constraint, we might apply to Wolfs numbers y
t
a transformation of the
form log(y
t
+ ) or of the more general form (y
t
+ )
1
, such as has been
advocated by Box and Cox [4]. A transformed series could be more closely
mimicked.
The contributions to time-series analysis made by Yule [24] and Slutsky
[19] in 1927 were complementary: in fact, the two authors grasped opposite
ends of the same pole. For ten years, Slutskys paper was available only in its
7

original Russian version; but its contents became widely known within a much
shorter period.
Slutsky posed the same question as did Yule, and in much the same man-
ner. Was it possible, he asked, that a denite structure of a connection between
chaotically random elements could form them into a system of more or less regu-
lar waves? Slutsky proceeded to demonstrate this possibility by methods which
were partly analytic and partly inductive. He discriminated between coherent
series whose elements were serially correlated and incoherent or purely random
series of the sort which we have described as white noise. As to the coherent
series, he declared that
their origin may be extremely varied, but it seems probable that an
especially prominent role is played in nature by the process of moving
summation with weights of one kind or another; by this process coher-
ent series are obtained from other coherent series or from incoherent
series.
By taking, as his basis, a purely random series obtained by the Peoples
Commissariat of Finance in drawing the numbers of a government lottery loan,
and by repeatedly taking moving summations, Slutsky was able to generate a
series which closely mimicked an index, of a distinctly undulatory nature, of
the English business cycle from 1855 to 1877.
The general form of Slutskys moving summation can be expressed by
writing
(6) y(t) =
0
(t) +
1
(t 1) + +
q
(t q),
where (t) is a white-noise process. This is nowadays called a qth-order moving-
average process, and it is readily compared to an autoregressive process of the
sort depicted under (5). The more general pth-order autoregressive process can
be expressed by writing
(7)
0
y(t) +
1
y(t 1) + +
p
y(t p) = (t).
Thus, whereas the autoregressive process depends upon a linear combination
of the function y(t) with its own lagged values, the moving-average process
depends upon a similar combination of the function (t) with its lagged values.
The anity of the two sorts of process is further conrmed when it is recognised
that an autoregressive process of nite order is equivalent to a moving-average
process of innite order and that, conversely, a nite-order moving-average
process is just an innite-order autoregressive process.
8

4. Generalised Harmonic Analysis
The next step to be taken in the development of the theory of time series
was to generalise the traditional method of periodogram analysis in such a way
as to overcome the problems which arise when the model depicted under (1) is
clearly inappropriate.
At rst sight, it would not seem possible to describe a covariance-station-
ary process, whose only regularities are statistical ones, as a linear combination
of perfectly regular periodic components. However any diculties which we
might envisage can be overcome if we are prepared to accept a description
which is in terms of a non-denumerable innity of periodic components. Thus,
on replacing the so-called Fourier sum within equation (1) by a Fourier integral,
and by deleting the term (t), whose eect is now absorbed by the integrand,
we obtain an expression in the form of
(8) y(t) =
_

0
_
cos(t)dA() + sin(t)dB()
_
.
Here we write dA() and dB() rather than ()d and ()d because there
can be no presumption that the functions A() and B() are continuous. As it
stands, this expression is devoid of any statistical interpretation. Moreover, if
we are talking of only a single realisation of the process y(t), then the generalised
functions A() and B() will reect the unique peculiarities of that realisation
and will not be amenable to any systematic description.
However, a fruitful interpretation can be given to these functions if we con-
sider the observable sequence y(t) = {y
t
; t = 0, 1, 2, . . .} to be a particular
realisation which has been drawn from an innite population representing all
possible realisations of the process. For, if this population is subject to statis-
tical regularities, then it is reasonable to regard dA() and dB() as mutually
uncorrelated random variables with well-dened distributions which depend
upon the parameters of the population.
We may therefore assume that, for any value of ,
(9)
E{dA()} = E{dB()} = 0 and
E{dA()dB()} = 0.
Moreover, to express the discontinuous nature of the generalised functions, we
assume that, for any two values and in their domain, we have
(10) E{dA()dA()} = E{dB()dB()} = 0,
which means that A() and B() are stochastic processesindexed on the
frequency parameter rather than on timewhich are uncorrelated in non-
overlapping intervals. Finally, we assume that dA() and dB() have a com-
mon variance so that
(11) V {dA()} = V {dB()} = dG().
9

1
2
3
4
5
0 /4 /2 3/4
Figure 6. The spectrum of the process y(t) = 1.343y(t 1) 0.655y(t
2) +(t) which generated the series in Figure 4. A series of a more regular
nature would be generated if the spectrum were more narrowly concentrated
around its modal value.
Given the assumption of the mutual uncorrelatedness of dA() and dB(),
it therefore follows from (8) that the variance of y(t) is expressible as
(12)
V {y(t)} =
_

0
_
cos
2
(t)V {dA()} + sin
2
(t)V {dB()}
=
_

0
dG().
The function G(), which is called the spectral distribution, tells us how much
of the variance is attributable to the periodic components whose frequencies
range continuously from 0 to . If none of these components contributes more
than an innitesimal amount to the total variance, then the function G() is
absolutely continuous, and we can write dG() = g()d under the integral
of equation (11). The new function g(), which is called the spectral den-
sity function or the spectrum, is directly analogous to the function expressing
the squared amplitude which is associated with each component in the simple
harmonic model discussed in our earlier sections.
10

5. Smoothing the Periodogram
It might be imagined that there is little hope of obtaining worthwhile es-
timates of the parameters of the population from which the single available
realisation y(t) has been drawn. However, provided that y(t) is a stationary
process, and provided that the statistical dependencies between widely sep-
arated elements are weak, the single realisation contains all the information
which is necessary for the estimation of the spectral density function. In fact,
a modied version of the traditional periodogram analysis is sucient for the
purpose of estimating the spectral density.
In some respects, the problems posed by the estimation of the spectral
density are similar to those posed by the estimation of a continuous probability
density function of unknown functional form. It is fruitless to attempt directly
to estimate the ordinates of such a function. Instead, we might set about our
task by constructing a histogram or bar chart to show the relative frequencies
with which the observations that have been drawn from the distribution fall
within broad intervals. Then, by passing a curve through the mid points of the
tops of the bars, we could construct an envelope that might approximate to
the sought-after density function. A more sophisticated estimation procedure
would not group the observations into the xed intervals of a histogram; instead
it would record the number of observations falling within a moving interval.
Moreover, a consistent method of estimation, which aims at converging upon
the true function as the number of observations increases, would vary the width
of the moving interval with the size of the sample, diminishing it suciently
slowly as the sample size increases for the number of sample points falling
within any interval to increase without bound.
A common method for estimating the spectral density is very similar to
the one which we have described for estimating a probability density function.
Instead of basing itself on raw sample observations as does the method of
density-function estimation, it bases itself upon the ordinates of a periodogram
which has been tted to the observations on y(t). This procedure for spectral
estimation is therefore called smoothing the periodogram.
A disadvantage of the procedure, which for many years inhibited its wide-
spread use, lies in the fact that calculating the periodogram by what would
seem to be the obvious methods by can be vastly time-consuming. Indeed, it
was not until the mid 1960s that wholly practical computational methods were
developed.
6. The Equivalence of the Two Domains
It is remarkable that such a simple technique as smoothing the peri-
odogram should provide a theoretical resolution to the problems encountered
by Beveridge and others in their attempts to detect the hidden periodicities in
economic and astronomical data. Even more remarkable is the way in which
11

the generalised harmonic analysis that gave rise to the concept of the spec-
tral density of a time series should prove itself to be wholly conformable with
the alternative methods of time-series analysis in the time domain which arose
largely as a consequence of the failure of the traditional methods of periodogram
analysis.
The synthesis of the two branches of time-series analysis was achieved in-
dependently and almost simultaneously in the early 1930s by Norbert Wiener
[23] in America and A. Khintchine [12] in Russia. The WienerKhintchine
theorem indicates that there is a one-to-one relationship between the autoco-
variance function of a stationary process and its spectral density function. The
relationship is expressed, in one direction, by writing,
(13) g() =
1
2
cos() ;
,
where g() is the spectral density function and {
; = 0, 1, 2, . . .} is the
sequence of the autocovariances of the series y(t).
The relationship is invertible in the sense that it is equally possible to
express each of the autocovariances as a function of the spectral density:
(14)
=
_

=0
cos()g()d.
If we set = 0, then cos() = 1, and we obtain, once more, the equation (12)
which neatly expresses the way in which the variance
0
= V {y(t)} of the series
y(t) is attributable to the constituent harmonic components; for g() is simply
the expected value of the squared amplitude of the component at frequency .
We have stated the relationships of the WienerKhintchine theorem in
terms of the theoretical spectral density function g() and the true autocovari-
ance function {
; = 0, 1, 2, . . .}. An analogous relationship holds between

the periodogram I(
j
) dened in (2) and the sample autocovariance function
{c
; = 0, 1, . . . , T 1} where c
(y
t
y)(y
t
y)/T. Thus, in the
appendix, we demonstrate the identity
(15) I(
j
) = 2
T1
t=1T
c
cos(
j
) ; c
= c
.
The upshot of the WienerKhintchine theorem is that many of the tech-
niques of time-series analysis can, in theory, be expressed in two mathematically
equivalent ways which may dier markedly in their conceptual qualities.
Often, a problem which appears to be intractable from the point of view
of one of the domains of time-series analysis becomes quite manageable when
12

translated into the other domain. A good example is provided by the matter of
spectral estimation. Given that there are diculties in computing all T of the
ordinates of the periodogram when the sample size is large, we are impelled to
look for a method of spectral estimation which depends not upon smoothing
the periodogram but upon performing some equivalent operation upon the se-
quence of autocovariances. The fact that there is a one-to-one correspondence
between the spectrum and the sequence of autocovariances assures us that this
equivalent operation must exist; though there is, of course, no guarantee that
it will be easy to perform.
10
20
30
40
0 /4 /2 3/4
Figure 7. The periodogram of Wolfers Sunspot Numbers 17491924.
In fact, the operation which we perform upon the sample autocovariances is
simple. For, if the sequence of autocovariances {c
; = 0, 1, . . . , T 1} in (15)
is replaced by a modied sequence {w
; = 0, 1, . . . , T 1} incorporating
a specially devised set of declining weights {w
; = 0, 1, . . . , T 1}, then
an eect which is much the same as that of smoothing the periodogram can
be achieved. Moreover, it may be relatively straightforward to calculate the
weighted autocovariance function.
The task of devising appropriate sets of weights provided a major research
topic in time-series analysis in the 1950s and early 1960s. Together with the
task of devising equivalent procedures for smoothing the periodogram, it came
to be known as spectral carpentry.
13

0.2
0.4
0.6
0.8
0 /4 /2 3/4
Figure 8. The spectrum of the sunspot numbers calculated from
the autocovariances using Parzens [15] system of weights.
7. The Maturing of Time-Series Analysis
In retrospect, it seems that time-series analysis reached its maturity in the
1970s when signicant developments occurred in both of its domains.
A major development in the frequency domain occurred when Cooley and
Tukey [7] described an algorithm which greatly reduces the eort involved in
computing the periodogram. The Fast Fourier Transform, as this algorithm has
come to be known, allied with advances in computer technology, has enabled the
routine analysis of extensive sets of data; and it has transformed the procedure
of smoothing the periodogram into a practical method of spectral estimation.
The contemporaneous developments in the time domain were inuenced by
an important book by Box and Jenkins [5]. These authors developed the time-
domain methodology by collating some of its major themes and by applying it
to such important functions as forecasting and control. They demonstrated how
wide had become the scope of time-series analysis by applying it to problems
as diverse as the forecasting of airline passenger numbers and the analysis of
combustion processes in a gas furnace. They also adapted the methodology to
the computer.
Many of the current practitioners of time-series analysis have learnt their
skills in recent years during a time when the subject has been expanding rapidly.
Lacking a longer perspective, it is dicult for them to gauge the signicance
of the recent practical advances. One might be surprised to hear, for example,
14

that as late as 1971 Granger and Hughes [8] were capable of declaring that
Beveridges calculation of the Periodogram of the Wheat Price Index, com-
prising 300 ordinates, was the most extensive calculation of its type to date.
Nowadays, computations of this order are performed on a routine basis using
microcomputers containing specially designed chips which are dedicated to the
purpose.
The rapidity of the recent developments also belies the fact that time-series
analysis has had a long history. The frequency domain of time-series analy-
sis, to which the idea of the harmonic decomposition of a function is central,
is an inheritance from Euler (17071783), dAlembert (17171783), Lagrange
(17361813) and Fourier (17681830). The search for hidden periodicities was
a dominant theme of 19th century science. It has been transmogried through
the renements of Wieners Generalised Harmonic Analysis which has enabled
us to understand how cyclical phenomena can arise out of the aggregation of
random causes. The parts of time-series analysis which bear a truly 20th-
century stamp are the time-domain models which originate with Slutsky and
Yule and the computational technology which renders the methods of both
domains practical.
The eect of the revolution in digital electronic computing upon the practi-
cability of time-series analysis can be gauged by inspecting the purely mechan-
ical devices (such as the HenriciConradi and MichelsonStratton harmonic
analysers invented in the 1890s) which were once used, with very limited suc-
cess, to grapple with problems which are nowadays almost routine. These
devices, some of which are displayed in Londons Science Museum, also serve
to remind us that many of the developments of applied mathematics which
startle us with their modernity were foreshadowed many years ago.
Mathematical Appendix
Mathematical Expectations
The mathematical expectation or the expected value of a random variable
y is dened by
(i) E(x) =
_

x=
xdF(x),
where F(x) is the probability distribution function of x. The probability distri-
bution function is dened by the expression F(x
) = P{x < x
} which denotes
the probability that x assumes a value less than x
. If F(x) is continuous
function, then we can write dF(x) = f(x)dx in equation (i). The function
f(x) = dF(x)/dx is called the probability density function.
If y(t) = {y
t
; t = 0, 1, 2, . . .} is a stationary stochastic process, then
E(y
t
) = is the same value for all t.
15

If y
0
, . . . , y
T1
is a sample of T values generated by the process, then we
may estimate from the sample mean
(ii) y =
1
T
T1
t=0
y
t
.
Autocovariances
The autocovariance of lag of the a stationary stochastic process y(t) is
dened by
(iii)
= E{(y
t
)(y
t
)}.
The autocovariance of lag provides a measure of the relatedness of the ele-
ments of the sequence y(t) which are separated by time periods.
The variance, which is denoted by V {y(t)} =
0
and dened by
(iv)
0
= E
_
(y
t
)
2
_
,
is a measure of the dispersion of the elements of y(t). It is formally the auto-
covariance of lag zero.
If y
t
and y
t
are statistically independent, then their joint probability
density function is the product of their individual probability density functions
so that f(y
t
, y
t
) = f(y
t
)f(y
t
). It follows that
(v)
= E(y
t
)E(y
t
) = 0 for all = 0.
If y
0
, . . . , y
T
is a sample from the process, and if < T, then we may estimate
from the sample autocovariance or empirical autocovariance of lag :

(vi) c
=
1
T
T1
t=
(y
t
y)(y
t
y).
The periodogram and the autocovariance function
The periodogram is dened by
(vii) I(
j
) =
2
T
_
_
T1
t=0
cos(
j
t)(y
t
y)
_
2
+
_
T1
t=0
sin(
j
t)(y
t
y)
_
2
_
.
16

The identity
t
cos(
j
t)(y
t
y) =
t
cos(
j
t)y
t
follows from the fact that,
by construction,
t
cos(
j
t) = 0 for all j. Hence the above expression has the
same value as the expression in (2). Expanding the expression in (vii) gives
(viii)
I(
j
) =
2
T
_
s
cos(
j
t) cos(
j
s)(y
t
y)(y
s
y)
_
+
2
T
_
s
sin(
j
t) sin(
j
s)(y
t
y)(y
s
y)
_
,
and, by using the identity cos(A) cos(B) +sin(A) sin(B) = cos(AB), we can
rewrite this as
(ix) I(
j
) =
2
T
_
s
cos(
j
[t s])(y
t
y)(y
s
y)
_
.
Next, on dening = t s and writing c
t
(y
t
y)(y
t
y)/T, we can
reduce the latter expression to
(x) I(
j
) = 2
T1
=1T
cos(
j
)c
,
which appears in the text as equation (15).
References
[1] Alberts, W. W., L. E. Wright and B. Feinstein (1965), Physiological
Mechanisms of Tremor and Rigidity in Parkinsonism. Connia Neuro-
logica, 26, 318327.
[2] Beveridge, Sir W. H. (1921), Weather and Harvest Cycles. Economic
Journal, 31, 429452.
[3] Beveridge, Sir W. H. (1922), Wheat Prices and Rainfall in Western Eu-
rope. Journal of the Royal Statistical Society, 85, 412478.
[4] Box, G. E. P. and D. R. Cox (1964), An Analysis of Transformations.
Journal of the Royal Statistical Society, Series B, 26, 211243.
[2] Box, G. E. P. and G. M. Jenkins (1970), Time Series Analysis, Forecasting
and Control. HoldenDay: San Francisco.
[6] BuysBallot, C. D. H. (1847), Les Changements Periodiques de Temper-
ature. Utrecht.
[7] Cooley, J. W. and J. W. Tukey (1965), An Algorithm for the Machine
Calculation of Complex Fourier Series. Mathematics of Computation, 19,
297301.
17

[8] Granger, C. W. J. and A. O. Hughes (1971), A New Look at Some Old
Data: The Beveridge Wheat Price Series.Journal of the Royal Statistical
Society, Series A, 134, 413428.
[9] Groves, G. W. and E. J. Hannan, (1968), Time-Series Regression of Sea
Level on Weather. Review of Geophysics, 6, 129174.
[10] Gudmundson, G. (1971), Time-Series Analysis of Imports, Exports and
other Economic Variables.Journal of the Royal Statistical Society, Series
A, 134, 383.
[11] Hassleman, K., W. Munk and G. MacDonald, (1963), Bispectrum of
Ocean Waves. In Time Series Analysis, M. Rosenblatt, (ed.) 125139.
John Wiley and Sons: New York.
[12] Khintchine, A. (1934), Korrelationstheorie der Stationaren Stochastis-
chen Prozesse. Math. Ann., 109, 604615.
[13] Lagrange, E. (1772, 1778), Oeuvres.
[14] Moore, H. L. (1914), Economic Cycles: Their Laws and Cause. Macmil-
lan: New York.
[15] Parzen, E. (1957), On Consistent Estimates of the Spectrum of a Sta-
tionary Time Series. Annals of Mathematical Statistics, 28, 329348.
[16] Rice, S. O. (1963), Noise in FM Receivers. In Time Series Analysis, M.
Rosenblatt, (ed.) 395422. John Wiley and Sons: New York.
[17] Schuster, Sir A. (1898), On the Investigation of Hidden Periodicities with
Application to a Supposed Twenty-Six Day Period of Meteorological Phe-
nomena. Terrestrial Magnetism, 3, 1341.
[18] Schuster, Sir A. (1906), On the Periodicities of Sunspots. Philosophical
Transactions of the Royal Society, Series A, 206, 69100.
[19] Slutsky, E. (1937), The Summation of Random Causes as the Source of
Cyclical Processes. Econometrica, 5, 105146.
[20] Tee, L. H. and S. U. Wu (1972), An Application of Stochastic and Dy-
namic Models for the Control of a Papermaking Process. Technometrics,
14 481496.
[21] Tukey, J. W. (1965), Data Analysis and the Frontiers of Geophysics.
Science, 148, 12831289.
[22] Whittaker, E. T. and G. Robinson (1924), The Calculus of Observations,
A Treatise on Numerical Mathematics. Blackie and Sons: London.
[23] Wiener, N. (1930), Generalised Harmonic Analysis. Acta Mathematica,
35, 117258.
[24] Yule, G. U. (1927), On a Method of Investigating Periodicities in Dis-
turbed Series with Special Reference to Wolfers Sunspot Numbers. Philo-
sophical Transactions of the Royal Society, 89, 164.
[25] Yuzuriha, T. (1960), The Autocorrelation Curves of Schizophrenic Brain
Waves and the Power Spectrum. Psych. Neurol. Jap. 26, 911924.
18

by
D.S.G. Pollock
This paper describes some of the principal themes of time-series analysis
and it gives an historical account of their development.
ysis which may be pursued. On the one hand there are the time-domain
methods which have their origin in the classical theory of correlation; and
they lead inevitably towards the construction of structural or parametric
models of the autoregressive moving-average type. On the other hand are
the frequency-domain methods of spectral analysis which are based on an
extension of the methods of Fourier analysis.
The paper describes the developments which led to the synthesis of
the two branches of time-series analysis and it indicates how this synthesis
was achieved.
It remains true that the majority of time-series analysts operate prin-
cipally in one or other of the two domains. Such specialisation is often
inuenced by the academic discipline to which the analyst adheres. How-
ever, it is clear that there are many advantages to be derived from pursuing
the two modes of analysis concurrently.
Address for correspondence:
D.S.G. Pollock
Department of Economics
Queen Mary College
University of London
Mile End Road
London E1 4 NS
Tel : +44-71-975-5096
Fax : +44-71-975-5500
19

LECTURE 7
Forecasting
with ARMA Models
Minimum Mean-Square Error Prediction
Imagine that y(t) is a stationary stochastic process with E{y(t)} = 0.
We may be interested in predicting values of this process several periods into
the future on the basis of its observed history. This history is contained in
the so-called information set. In practice, the latter is always a nite set
{y
t
, y
t1
, . . . , y
tp
} representing the recent past. Nevertheless, in developing
the theory of prediction, it is also useful to consider an innite information set
I
t
= {y
t
, y
t1
, . . . , y
tp
, . . .} representing the entire past.
We shall denote the prediction of y
t+h
which is made at the time t by
y
t+h|t
or by y
t+h
when it is clear that we are predicting h steps ahead.
The criterion which is commonly used in judging the performance of an
estimator or predictor y of a random variable y is its mean-square error dened
by E{(y y)
2
}. If all of the available information on y is summarised in its
marginal distribution, then the minimum-mean-square-error prediction is sim-
ply the expected value E(y). However, if y is statistically related to another
random variable x whose value can be observed, and if the form of the joint
distribution of x and y is known, then the minimum-mean-square-error predic-
tion of y is the conditional expectation E(y|x). This proposition may be stated
formally:
(1) Let y = y(x) be the conditional expectation of y given x which is
also expressed as y = E(y|x). Then E{(y y)
2
} E{(y )
2
},
where = (x) is any other function of x.
Proof. Consider
(2)
E
_
(y )
2
_
= E
_
_
(y y) + ( y )
_
2
_
= E
_
(y y)
2
_
+ 2E
_
(y y)( y )
_
+ E
_
( y )
2
_
1

D.S.G. POLLOCK : A SHORT COURSE OF TIME-SERIES ANALYSIS
Within the second term, there is
(3)
E
_
(y y)( y )
_
=
_
x
_
y
(y y)( y )f(x, y)yx
=
_
x
__
y
(y y)f(y|x)y
_
( y )f(x)x
= 0.
Here the second equality depends upon the factorisation f(x, y) = f(y|x)f(x)
which expresses the joint probability density function of x and y as the product
of the conditional density function of y given x and the marginal density func-
tion of x. The nal equality depends upon the fact that
_
(y y)f(y|x)y =
E(y|x) E(y|x) = 0. Therefore E{(y )
2
} = E{(y y)
2
} + E{( y )
2
}
E{(y y)
2
}, and the assertion is proved.
The denition of the conditional expectation implies that
(4)
E(xy) =
_
x
_
y
xyf(x, y)yx
=
_
x
x
__
y
yf(y|x)y
_
f(x)x
= E(x y).
When the equation E(xy) = E(x y) is rewritten as
(5) E
_
x(y y)
_
= 0,
it may be described as an orthogonality condition. This condition indicates
that the prediction error y y is uncorrelated with x. The result is intuitively
appealing; for, if the error were correlated with x, we should not using the
information of x eciently in forming y.
The proposition of (1) is readily generalised to accommodate the case
where, in place of the scalar x, there is a vector x = [x
1
, . . . , x
p
]
. This gen-
eralisation indicates that the minimum-mean-square-error prediction of y
t+h
given the information in {y
t
, y
t1
, . . . , y
tp
} is the conditional expectation
E(y
t+h
|y
t
, y
t1
, . . . , y
tp
).
In order to determine the conditional expectation of y
t+h
given {y
t
, y
t1
,
. . . , y
tp
}, we need to known the functional form of the joint probability den-
sity function all of these variables. In lieu of precise knowledge, we are often
prepared to assume that the distribution is normal. In that case, it follows that
the conditional expectation of y
t+h
is a linear function of {y
t
, y
t1
, . . . , y
tp
};
and so the problem of predicting y
t+h
becomes a matter of forming a linear
2

D.S.G. POLLOCK : FORECASTING
regression. Even if we are not prepared to assume that the joint distribution
of the variables in normal, we may be prepared, nevertheless, to base the pre-
diction of y upon a linear function of {y
t
, y
t1
, . . . , y
tp
}. In that case, the
criterion of minimum-mean-square-error linear prediction is satised by form-
ing y
t+h
=
1
y
y
+
2
y
t1
+ +
p+1
y
tp
from the values
1
, . . . ,
p+1
which
minimise
(6)
E
_
(y
t+h
y
t+h
)
2
_
= E
_
_
y
t+h

p+1
j=1
j
y
tj+1
_
2
_
=
0
2
h+j1
+
ij
,
wherein
ij
= E(
ti
tj
). This is a linear least-squares regression problem
which leads to a set of p + 1 orthogonality conditions described as the normal
equations:
(7)
E
_
(y
t+h
y
t+h
)y
tj+1
_
=
h+j1

p
i=1
ij
= 0 ; j = 1, . . . , p + 1.
In matrix terms, these are
(8)
_
0

1
. . .
p
1

0
. . .
p1
.
.
.
.
.
.
.
.
.
.
.
.
p

p1
. . .
0
_
_
_
2
.
.
.
p+1
_
_
=
_
h+1
.
.
.
h+p
_
_
.
Notice that, for the one-step-ahead prediction of y
t+1
, they are nothing but the
YuleWalker equations.
In the case of an optimal predictor which combines previous values of the
series, it follows from the orthogonality principle that the forecast errors are
uncorrelated with the previous predictions.
A result of this sort is familiar to economists in connection with the so-
called ecient-markets hypothesis. A nancial market is ecient if the prices of
the traded assets constitute optimal forecasts of their discounted future returns,
which consist of interest and dividend payments and of capital gains.
According to the hypothesis, the changes in asset prices will be uncorre-
lated with the past or present price levels; which is to say that asset prices will
follow random walks. Moreover, it should not be possible for someone who is
appraised only of the past history of asset prices to reap speculative prots on
a systematic and regular basis.
3

Forecasting with ARMA Models
So far, we have avoided making specic assumptions about the nature of
the process y(t). We are greatly assisted in the business of developing practical
forecasting procedures if we can assume that y(t) is generated by an ARMA
process such that
(9) y(t) =
(L)
(L)
(t) = (L)(t).
We shall continue to assume, for the sake of simplicity, that the forecasts
are based on the information contained in the innite set {y
t
, y
t1
, y
t2
, . . .} =
I
t
comprising all values that have been taken by the variable up to the present
time t. Knowing the parameters in (L) enables us to recover the sequence
{
t
,
t1
,
t2
, . . .} from the sequence {y
t
, y
t1
, y
t2
, . . .} and vice versa; so ei-
ther of these constitute the information set. This equivalence implies that the
forecasts may be expressed in terms {y
t
} or in terms {
t
} or as a combination
of the elements of both sets.
Let us write the realisations of equation (9) as
(10)
y
t+h
= {
0
t+h
+
1
t+h1
+ +
h1
t+1
}
+{
h
t
+
h+1
t1
+ }.
Here the rst term on the RHS embodies disturbances subsequent to the time
t when the forecast is made, and the second term embodies disturbances which
are within the information set {
t
,
t1
,
t2
, . . .}. Let us now dene a forecast-
ing function, based on the information set, which takes the form of
(11) y
t+h|t
= {
h
t
+
h+1
t1
+ }.
Then, given that (t) is a white-noise process, it follows that the mean square
of the error in the forecast h periods ahead is given by
(12) E
_
(y
t+h
y
t+h
)
2
_
=
2
h1
i=0
2
i
+
2
i=h
(
i

i
)
2
.
Clearly, the mean-square error is minimised by setting
i
=
i
; and so the
optimal forecast is given by
(13) y
t+h|t
= {
h
t
+
h+1
t1
+ }.
This might have been derived from the equation y(t + h) = (L)(t + h),
which generates the true value of y
t+h
, simply by putting zeros in place of the
unobserved disturbances
t+1
,
t+2
, . . . ,
t+h
which lie in the future when the
4

forecast is made. Notice that, on the assumption that the process is stationary,
the mean-square error of the forecast tends to the value of
(14) V
_
y(t)
_
=
2
2
i
as the lead time h of the forecast increases. This is nothing but the variance of
the process y(t).
The optimal forecast of (5) may also be derived by specifying that the
forecast error should be uncorrelated with the disturbances up to the time of
making the forecast. For, if the forecast errors were correlated with some of
the elements of the information set, then, as we have noted before, we would
not be using the information eciently, and we could not be generating opti-
mal forecasts. To demonstrate this result anew, let us consider the covariance
between the forecast error and the disturbance
ti
:
(15)
E
_
(y
t+h
y
t+h
)
ti
_
=
h
k=1
hk
E(
t+k
ti
)
+
j=0
(
h+j

h+j
)E(
tj
ti
)
=
2
(
h+i

h+i
).
Here the nal equality follows from the fact that
(16) E(
tj
ti
) =
_
, if i = j,
0, if i = j.
If the covariance in (15) is to be equal to zero for all values of i 0, then we
must have
i
=
i
for all i, which means that the forecasting function must be
the one that has been specied already under (13).
It is helpful, sometimes, to have a functional notation for describing the
process which generates the h-steps-ahead forecast. The notation provided by
Whittle (1963) is widely used. To derive this, let us begin by writing
(17) y(t + h) =
_
L
h
(L)
_
(t).
On the LHS, there are not only the lagged sequences {(t), (t1), . . .} but also
the sequences (t + h) = L
h
(t), . . . , (t + 1) = L
1
(t), which are associated
with negative powers of L which serve to shift a sequence forwards in time. Let
{L
h
(L)}
+
be dened as the part of the operator containing only nonnegative
powers of L. Then the forecasting function can be expressed as
(18)
y(t + h|t) =
_
L
h
(L)
_
+
(t),
=
_
(L)
L
h
_
+
1
(L)
y(t).
5

Example. Consider an ARMA (1, 1) process represented by the equation
(19) (1 L)y(t) = (1 L)(t).
The function which generates the sequence of forecasts h steps ahead is given
by
(20)
y(t + h|t) =
_
L
h
_
1 +
( )L
1 L
__
+
(t)
=
h1
( )
1 L
(t)
=
h1
( )
1 L
y(t).
When = 0, this gives the simple result that y(t + h|t) =
h
y(t).
Generating The Forecasts Recursively
We have already seen that the optimal (minimum-mean-square-error) fore-
cast of y
t+h
can be regarded as the conditional expectation of y
t+h
given the
information set I
t
which comprises the values of {
t
,
t1
,
t2
, . . .} or equally
the values of {y
t
, y
t1
, y
t2
, . . .}. On taking expectations of y(t) and (t) con-
ditional on I
t
, we nd that
(21)
E(y
t+k
|I
t
) = y
t+k|t
if k > 0,
E(y
tj
|I
t
) = y
tj
if j 0,
E(
t+k
|I
t
) = 0 if k > 0,
E(
tj
|I
t
) =
tj
if j 0.
In this notation, the forecast h periods ahead is
(22)
E(y
t+h
|I
t
) =
h
k=1
hk
E(
t+k
|I
t
) +
j=0
h+j
E(
tj
|I
t
)
=
j=0
h+j
tj
.
In practice, the forecasts may be generated using a recursion based on the
equation
(23)
y(t) =
_
1
y(t 1) +
2
y(t 2) + +
p
y(t p)
_
+
0
(t) +
1
(t 1) + +
q
(t q).
6

By taking the conditional expectation of this function, we get
(24)
y
t+h
= {
1
y
t+h1
+ +
p
y
t+hp
}
+
h
t
+ +
q
t+hq
when 0 < h p, q,
(25) y
t+h
= {
1
y
t+h1
+ +
p
y
t+hp
} if q < h p,
(26)
y
t+h
= {
1
y
t+h1
+ +
p
y
t+hp
}
+
h
t
+ +
q
t+hq
if p < h q,
and
(27) y
t+h
= {
1
y
t+h1
+ +
p
y
t+hp
} when p, q < h.
It can be from (27) that, for h > p, q, the forecasting function becomes a
pth-order homogeneous dierence equation in y. The p values of y(t) from
t = r = max(p, q) to t = r p +1 serve as the starting values for the equation.
The behaviour of the forecast function beyond the reach of the starting
values can be characterised in terms of the roots of the autoregressive operator.
It may be assumed that none of the roots of (L) = 0 lie inside the unit circle;
for, if there were roots inside the circle, then the process would be radically
unstable. If all of the roots are less than unity, then y
t+h
will converge to
zero as h increases. If one of the roots of (L) = 0 is unity, then we have an
ARIMA(p, 1, q) model; and the general solution of the homogeneous equation
of (27) will include a constant term which represents the product of the unit
root with an coecient which is determined by the starting values. Hence the
forecast will tend to a nonzero constant. If two of the roots are unity, then
the general solution will embody a linear time trend which is the asymptote to
which the forecasts will tend. In general, if d of the roots are unity, then the
general solution will comprise a polynomial in t of order d 1.
The forecasts can be updated easily once the coecients in the expansion
of (L) = (L)/(L) have been obtained. Consider
(28)
y
t+h|t+1
= {
h1
t+1
+
h
t
+
h+1
t1
+ } and
y
t+h|t
= {
h
t
+
h+1
t1
+
h+2
t2
+ }.
The rst of these is the forecast for h 1 periods ahead made at time t + 1
whilst the second is the forecast for h periods ahead made at time t. It can be
seen that
(29) y
t+h|t+1
= y
t+h|t
+
h1
t+1
,
7

where
t+1
= y
t+1
y
t+1
is the current disturbance at time t + 1. The later is
also the prediction error of the one-step-ahead forecast made at time t.
Example. For an example of the analytic form of the forecast function, we
may consider the Integrated Autoregressive (IAR) Process dened by
(30)
_
1 (1 + )L + L
2
_
y(t) = (t),
wherein (0, 1). The roots of the auxiliary equation z
2
(1 + )z + = 0
are z = 1 and z = . The solution of the homogeneous dierence equation
(31)
_
1 (1 + )L + L
2
_
y(t + h|t) = 0,
which denes the forecast function, is
(32) y(t + h|t) = c
1
+ c
2
h
,
where c
1
and c
2
are constants which reect the initial conditions. These con-
stants are found by solving the equations
(33)
y
t1
= c
1
+ c
2
1
,
y
t
= c
1
+ c
2
.
The solutions are
(34) c
1
=
y
t
y
t1
1
and c
2
=

1
(y
t
y
t1
).
The long-term forecast is y = c
1
which is the asymptote to which the forecasts
tend as the lead period h increases.
Ad-hoc Methods of Forecasting
There are some time-honoured methods of forecasting which, when anal-
ysed carefully, reveal themselves to be the methods which are appropriate to
some simple ARIMA models which might be suggested by a priori reason-
ing. Two of the leading examples are provided by the method of exponential
smoothing and the HoltWinters trend-extrapolation method.
Exponential Smoothing. A common forecasting procedure is exponential
smoothing. This depends upon taking a weighted average of past values of the
time series with the weights following a geometrically declining pattern. The
function generating the one-step-ahead forecasts can be written as
(35)
y(t + 1|t) =
(1 )
1 L
y(t)
= (1 )
_
y(t) + y(t 1) +
2
y(t 2) +
_
.
8

On multiplying both sides of this equation by 1 L and rearranging, we get
(36) y(t + 1|t) = y(t|t 1) + (1 )y(t),
which shows that the current forecast for one step ahead is a convex combina-
tion of the previous forecast and the value which actually transpired.
The method of exponential smoothing corresponds to the optimal fore-
casting procedure for the ARIMA(0, 1, 1) model (1 L)y(t) = (1 L)(t),
which is better described as an IMA(1, 1) model. To see this, let us consider
the ARMA(1, 1) model y(t) y(t 1) = (t) (t 1). This gives
(37)
y(t + 1|t) = y(t) (t)
= y(t)
(1 L)
1 L
y(t)
=
{(1 L) (1 L)}
1 L
y(t)
=
( )
1 L
y(t).
On setting = 1, which converts the ARMA(1, 1) model to an IMA(1, 1) model,
we obtain precisely the forecasting function of (35).
The HoltWinters Method. The HoltWinters algorithm is useful in ex-
trapolating local linear trends. The prediction h periods ahead of a series
y(t) = {y
t
, t = 0, 1, 2, . . .} which is made at time t is given by
(38) y
t+h|t
=
t
+

t
h,
where
(39)

t
= y
t
+ (1 )(
t1
+

t1
)
= y
t
+ (1 ) y
t|t1
is the estimate of an intercept or levels parameter formed at time t and
(40)

t
= (
t

t1
) + (1 )
t1
is the estimate of the slope parameter, likewise formed at time t. The coe-
cients , (0, 1] are the smoothing parameters.
The algorithm may also be expressed in error-correction form. Let
(41) e
t
= y
t
y
t|t1
= y
t

t1

t1
9

be the error at time t arising from the prediction of y
t
on the basis of information
available at time t 1. Then the formula for the levels parameter can be given
as
(42)

t
= e
t
+ y
t|t1
= e
t
+
t1
+

t1
,
which, on rearranging, becomes
(43)
t

t1
= e
t
+

t1
.
When the latter is drafted into equation (40), we get an analogous expression
for the slope parameter:
(44)
t
= (e
t
+

t1
) + (1 )
t1
= e
t
+

t1
.
In order reveal the underlying nature of this method, it is helpful to com-
bine the two equations (42) and (44) in a simple state-space model:
(45)
_
(t)
(t)
_
=
_
1 1
0 1
_ _
(t 1)
(t 1)
_
+
_

_
e(t).
This can be rearranged to give
(46)
_
1 L L
0 1 L
_ _
(t)
(t)
_
=
_

_
e(t).
The solution of the latter is
(47)
_
(t)
(t)
_
=
1
(1 L)
2
_
1 L L
0 1 L
_ _

_
e(t).
Therefore, from (38), it follows that
(48)
y(t + 1|t) = (t) +

(t)
=
( + )e(t) + e(t 1)
(1 L)
2
.
This can be recognised as the forecasting function of an IMA(2, 2) model of
the form
(49) (I L)
2
y(t) =
0
(t) +
1
(t 1) +
2
(t 2)
10

for which
(50) y(t + 1|t) =

1
(t) +
2
(t 1)
(1 L)
2
.
The Local Trend Model. There are various arguments which suggest that
an IMA(2, 2) model might be a natural model to adopt. The simplest of these
arguments arises from an elaboration of a second-order random walk which
adds an ordinary white-noise disturbance to the tend. The resulting model
may be expressed in two equations
(51)
(I L)
2
(t) = (t),
y(t) = (t) + (t),
where (t) and (t) are mutually independent white-noise processes. Combining
the equations, and using the notation = 1 L, gives
(52)
y(t) =
(t)
2
+ (t)
=
(t) +
2
(t)
2
.
Here the numerator (t)+
2
(t) = {(t)+(t)}2(t1)+(t2) constitutes
an second-order MA process.
Slightly more elaborate models with the same outcome have also been
proposed. Thus the so-called structural model consists of the equations
(53)
y(t) = (t) + (t),
(t) = (t 1) + (t 1) + (t),
(t) = (t 1) + (t).
Working backwards from the nal equation gives
(54)
(t) =
(t)
,
(t) =
(t 1)
+
(t)
=
(t 1)
2
+
(t)
,
y(t) =
(t 1)
2
+
(t)
+ (t)
=
(t 1) +(t) +
2
(t)
2
.
11

Once more, the numerator constitutes a second-order MA process.
Equivalent Forecasting Functions
Consider a model which combines a global linear trend with an autoregres-
sive disturbance process:
(55) y(t) =
0
+
1
t +
(t)
I L
.
The formation of an h-step-ahead prediction is straightforward; for we can
separate the forecast function into two additive parts.
The rst part of the function is the extrapolation of the global linear trend.
This takes the form of
(56)
z
t+h|t
=
0
+
1
(t + h)
= z
t
+
1
h
where z
t
=
0
+
1
t.
The second part is the prediction associated with the AR(1) disturbance
term (t) = (I L)
1
(t). The following iterative scheme is provides a recur-
sive solution to the problem of generating the forecasts:
(57)

t+1|t
=
t
,

t+2|t
=
t+1|t
,

t+3|t
=
t+2|t
, etc.
Notice that the analytic solution of the associated dierence equation is just
(58)
t+h|t
=
h
t
.
This reminds us that, whenever we can express the forecast function in terms
of a linear recursion, we can also express it in an analytic form embodying the
roots of a polynomial lag operator. The operator in this case is the AR(1)
operator I L. Since, by assumption, || < 1, it is clear that the contribution
of the disturbance part to the overall forecast function
(59) y
t+h|t
= z
t+h|t
+
t+h|t
,
becomes negligible when h becomes large.
Consider the limiting case when 1. Now, in place of an AR(1) distur-
bance process, we have to consider a random-walk process. We know that the
forecast function of a random walk consists of nothing more than a constant
12

function. On adding this constant to the linear function z
t+h|t
=
0
+
1
(t +h)
we continue to have a simple linear forecast function.
Another way of looking at the problem depends upon writing equation
(55) as
(60) (I L)
_
y(t)
0

1
t
_
= (t).
Setting = 1 turns the operator I L into the dierence operator I L = .
But
0
= 0 and
1
t =
1
, so equation (60) with = 1 can also be written
as
(61) y(t) =
1
+ (t).
This is the equation of a process which is described as random walk with drift.
Yet another way of expressing the process is via the equation y(t) = y(t 1) +
1
+ (t).
It is intuitively clear that, if the random walk process z(t) = (t) is
associated with a constant forecast function, and if z(t) = y(t)
0
1
t, then
y(t) will be associated with a linear forecast function.
The purpose of this example has been to oer a limiting case where mod-
els with local stochastic trendsie. random walk and unit root modelsand
models with global polynomial trends come together. Finally, we should notice
that the model of random walk with drift has the same linear forecast function
as the model
(62)
2
y(t) = (t)
which has two unit roots in the AR operator.
13

LECTURE 8
The Identication
of ARIMA Models
As we have established in a previous lecture, there is a one-to-one cor-
respondence between the parameters of an ARMA(p, q) model, including the
variance of the disturbance, and the leading p + q + 1 elements of the auto-
covariance function. Given the true autocovariances of a process, we might
be able to discern the orders p and q of its autoregressive and moving-average
operators and, given these orders, we should then be able to deduce the values
of the parameters.
There are two other functions, prominent in time-series analysis, from
which it is possible to recover the parameters of an ARMA process. These
are the partial autocorrelation function and the spectral density function. The
appearance of each of these functions gives an indication of the nature of the
underlying process to which they belong; and, in theory, the business of iden-
tifying the model and of recovering its parameters can be conducted on the
basis of any of them. In practice, the process is assisted by taking account of
all three functions.
The empirical versions of the three functions which are used in a model-
building exercise may dier considerably from their theoretical counterparts.
Even when the data are truly generated by an ARMA process, the sampling
errors which aect the empirical functions can lead one to identify the wrong
model. This hazard is revealed by sampling experiments. When the data come
from the real world, the notion that there is an underlying ARMA process
is a ction, and the business of model identication becomes more doubtful.
Then there may be no such thing as the correct model; and the choice amongst
alternative models must be made partly with a view their intended uses.
The Autocorrelation Functions
The techniques of model identication which are most commonly used were
propounded originally by Box and Jenkins (1972). Their basic tools were the
sample autocorrelation function and the partial autocorrelation function. We
shall describe these functions and their use separately from the spectral density
function which ought, perhaps, to be used more often in selecting models.
The fact that spectral density function is often overlooked is probably due to
1

D.S.G. POLLOCK : ECONOMIC FORECASTING 1992/3
an unfamiliarity with frequency-domain analysis on the part of many model
builders.
Autocorrelation function (ACF). Given a sample y
0
, y
1
, . . . , y
T1
of T
observations, we dene the sample autocorrelation function to be the sequence
of values
(1) r
= c
/c
0
, = 0, 1, . . . , T 1,
wherein
(2) c
=
1
T
T1
t=
(y
t
y)(y
t
y)
is the empirical autocovariance at lag and c
0
is the sample variance. One
should note that, as the value of the lag increases, the number of observations
comprised in the empirical autocovariance diminishes until the nal element
c
T1
= T
1
(y
0
y)(y
T1
y) is reached which comprises only the rst and
last mean-adjusted observations.
In plotting the sequence {r
}, we shall omit the value of r

0
which is in-
variably unity. Moreover, in interpreting the plot, one should be wary of giving
too much credence to the empirical autocorrelations at lag values which are
signicantly high in relation to the size of the sample.
Partial autocorrelation function (PACF). The sample partial autocor-
relation p
at lag is simply the correlation between the two sets of residuals

obtained from regressing the elements y
t
and y
t
on the set of intervening
values y
1
, y
2
, . . . , y
t+1
. The partial autocorrelation measures the dependence
between y
t
and y
t
after the eect of the intervening values has been removed.
The sample partial autocorrelation p
is virtually the same quantity as

the estimated coecient of lag obtained by tting an autoregressive model of
order to the data. Indeed, the dierence between the two quantities vanishes
as the sample size increases. The DurbinLevinson algorithm provides an e-
cient way of computing the sequence {p
} of partial autocorrelations from the

sequence of {c
} of autocovariances. It can be seen, in view of this algorithm,

that the information in {c
} is equivalent to the information contained jointly

in {p
} and c
0
. Therefore the sample autocorrelation function {r
t
} and the
sample partial autocorrelation function {p
t
} are equivalent in terms of their
information content.
The Methodology of Box and Jenkins
The model-building methodology of Box and Jenkins, relies heavily upon
the two functions {r
t
} and {p
t
} dened above. It involves a cycle comprising
the three stages of model selection, model estimation and model checking. In
view of the diculties of selecting an appropriate model, it is envisaged that
the cycle might have to be repeated several times and that, at the end, there
might be more than one model of the same series.
2

D.S.G. POLLOCK : ARIMA IDENTIFICATION
15.5
16.0
16.5
17.0
17.5
18.0
18.5
0 50 100 150
0.00
0.25
0.50
0.75
1.00
0 5 10 15 20 25
0.00
0.25
0.50
0.75
1.00
0.25
0.50
0 5 10 15 20 25
Figure 1. The concentration readings from a chemical process with the
autocorrelation function and the autocorrelation function of the dierences.
3

Reduction to stationarity. The rst step, which is taken before embarking
on the cycle, is to examine the time plot of the data and to judge whether or
not it could be the outcome of a stationary process. If a trend is evident in
the data, then it must be removed. A variety of techniques of trend removal,
which include the tting of parametric curves and of spline functions, have
been discussed in previous lectures. When such a function is tted, it is to the
sequence of residuals that the ARMA model is applied.
However, Box and Jenkins were inclined to believe that many empirical
series can be modelled adequately by supposing that some suitable dierence
of the process is stationary. Thus the process generating the observed series
y(t) might be modelled by the ARIMA(p, d, q) equation
(3) (L)
d
y(t) = (L)(t),
wherein
d
= (I L)
d
is the dth power of the dierence operator. In that
case, the dierenced series z(t) =
d
y(t) will be described by a stationary
ARMA(p, q) model. The inverse operator
1
is the summing or integrating
operator, which accounts for the fact that the model depicted by equation (3)
is described an autoregressive integrated moving-average model.
To determine whether stationarity has been achieved, either by trend re-
moval or by dierencing, one may examine the autocorrelation sequence of the
residual or processed series. The sequence corresponding to a stationary process
should converge quite rapidly to zero as the value of the lag increases. An em-
pirical autocorrelation function which exhibits a smooth pattern of signicant
values at high lags indicates a nonstationary series.
An example is provided by Figure 1 where a comparison is made between
the autocorrelation function of the original series and that of its dierences.
Although the original series does not appear to embody a systematic trend,
it does drift in a haphazard manner which suggests a random walk; and it is
appropriate to apply the dierence operator.
Once the degree of dierencing has been determined, the autoregressive
and moving-average orders are selected by examining the sample autocorrela-
tions and sample partial autocorrelations. The characteristics of pure autore-
gressive and pure moving-average process are easily spotted. Those of a mixed
autoregressive moving-average model are not so easily unravelled.
Moving-average processes. The theoretical autocorrelation function {
}
of a pure moving-average process of order q has
= 0 for all > q. The

corresponding partial autocorrelation function {
} is liable to decay towards

zero gradually. To judge whether the corresponding sample autocorrelation
function {r
} shows evidence of a truncation, we need some scale by which to

judge the signicance of the values of its elements.
4

0
1
2
3
4
0
1
2
3
4
5
0 25 50 75 100
0.00
0.25
0.50
0.75
1.00
0.25
0 5 10 15 20 25
0.00
0.25
0.50
0.75
1.00
0.25
0.50
0.75
0 5 10 15 20 25
Figure 2. The graph of 120 observations on a simulated series generated
by the MA(2) process y(t) = (1 + 0.90L + 0.81L
2
)(t) together with the
theoretical and empirical ACFs (middle) and the theoretical and empirical
PACFs (bottom). The theoretical values correspond to the solid bars.
5

As a guide to determining whether the parent autocorrelations are in fact
zero after lag q, we may use a result of Bartlett [1946] which shows that, for a
sample of size T, the standard deviation of r
is approximately
(4)
1
1 + 2(r
2
1
+ r
2
2
+ + r
2
q
)
1/2
for > q.
The result is also given by Fuller [1976, p. 237]. A simpler measure of the scale
of the autocorrelations is provided by the limits of 1.96/
T which are the

approximate 95% condence bounds for the autocorrelations of a white-noise
sequence. These bounds are represented by the dashed horizontal lines on the
accompanying graphs.
Autoregressive processes. The theoretical autocorrelation function {
}
of a pure autoregressive process of order p obeys a homogeneous dierence
equation based upon the autoregressive operator (L) = 1 +
1
L+ +
p
L
p
.
That is to say
(5)
= (
1
1
+ +
p
p
) for all p.
In general, the sequence generated by this equation will represent a mixture of
damped exponential and sinusoidal functions. If the sequence is of a sinusoidal
nature, then the presence of complex roots in the operator (L) is indicated.
One can expect the empirical autocovariance function of a pure AR process to
be of the same nature as its theoretical parent.
It is the partial autocorrelation function which serves most clearly to iden-
tify a pure AR process. The theoretical partial autocorrelations function {
}
of a AR(p) process has
= 0 for all > p. Likewise, all elements of the

sample partial autocorrelation function are expected to be close to zero for lags
greater than p, which corresponds to the fact that they are simply estimates
of zero-valued parameters. The signicance of the values of the partial auto-
correlations is judged by the fact that, for a pth order process, their standard
deviations for all lags greater that p are approximated by 1/
T. Thus the
bounds of 1.96/
T are also plotted on the graph of the partial autocorrela-

tion function.
Mixed processes. In the case of a mixed ARMA(p, q) process, neither the
theoretical autocorrelation function not the theoretical partial autocorrelation
function have any abrupt cutos. Indeed, there is little that can be inferred
from either of these functions or from their empirical counterparts beyond the
fact that neither a pure MA model nor a pure AR model would be inappropriate.
On its own, the autocovariance function of an ARMA(p, q) process is not easily
distinguished from that of a pure AR process. In particular, its elements
satisfy the same dierence equation as that of a pure AR model for all values
of > max(p, q).
6

0
5
10
15
0
5
10
15
0 25 50 75 100
0.00
0.25
0.50
0.75
1.00
0.25
0.50
0 5 10 15 20 25
0.00
0.25
0.50
0.75
1.00
0.25
0.50
0.75
1.00
0 5 10 15 20 25
Figure 3. The graph of 120 observations on a simulated series generated
by the AR(2) process (1 1.69L + 0.81L
2
)y(t) = (t) together with the
theoretical and empirical ACFs (middle) and the theoretical and empirical
7

0
10
20
30
40
0
10
20
30
0 20 50 75 100
0.00
0.25
0.50
0.75
1.00
0.25
0.50
0 5 10 15 20 25
0.00
0.25
0.50
0.75
1.00
0.25
0.50
0.75
1.00
0 5 10 15 20 25
Figure 4. The graph of 120 observations on a simulated series generated by the
ARMA(2, 2) process (11.69L+0.81L
2
)y(t) = (1+0.90L+0.81L
2
)(t) together
with the theoretical and emprical ACFs (middle) and the theoretical and empirical
8

There is good reason to regard mixed models as more appropriate in prac-
tice than pure models of either variety. For a start, there is the fact that a
rational transfer function is far more eective in approximating an arbitrary
impulse response than is an autoregressive transfer function, whose parameters
are conned to the denominator, or a moving-average transfer function, which
has its parameters in the numerator. Indeed, it might be appropriate, some-
times, to approximate a pure process of a high order by a more parsimonious
mixed model.
Mixed models are also favoured by the fact that the sum of any two mu-
tually independent autoregressive process gives rise to an ARMA process. Let
y(t) and z(t) be autoregressive processes of orders p and r respectively which
are described by the equations (L)y(t) = (t) and (L)z(t) = (t), wherein
(t) and (t) are mutually independent white-noise processes. Then their sum
will be
(6)
y(t) + z(t) =
(t)
(L)
+
(t)
(L)
=
(L)(t) + (L)(t)
(L)(L)
=
(L)(t)
(L)(L)
,
where (L)(t) = (L)(t) + (L)(t) constitutes a moving-average process of
order max(p, r).
In economics, where the data series are highly aggregated, mixed models
would seem to be called for often. In the context of electrical and mechanical
engineering, there may be some justication for pure AR models. Here there is
often abundant data, sucient to sustain the estimation of pure autoregressive
models of high order. Therefore the principle of parametric parsimony is less
persuasive than it might be in an econometric context. However, pure AR
models perform poorly whenever the data is aected by errors of observation;
and, in this respect, a mixed model is liable to be more robust. One can
understand this feature of mixed models by recognising that the sum of a pure
AR(p) process an a white-noise process is an ARMA(p, p) process.
9

LECTURE 9
Nonparametric Estimation of
the Spectral Density Function
The Spectrum and the Periodogram
The spectral density of a stochastic process is dened by
(1) f() =
1
2
_
0
+ 2
=1
cos()
_
, [0, ].
The obvious way to estimate this function is to replace the unknown autoco-
variances {
} by the corresponding empirical moments {c
} where
(2) c
=
1
T
T1
t=
(y
t
y)(y
t
y) if T 1.
Notice that, beyond a lag of = T 1, the autocovariances are not estimable
since
(3) c
T1
=
1
T
(y
0
y)(y
T1
y)
comprises the rst and the last elements of the sample; and therefore, we must
set c
= 0 when > T 1. Thus we obtain a sample spectrum in the form of

(4) f
r
() =
1
2
_
c
0
+ 2
T1
=1
c
cos()
_
.
The sample spectrum dened in this way is just 1/4 times the periodogram
of the sample which is given by
(5)
I(
j
) = 2
_
c
0
+ 2
T1
=1
c
cos(
j
)
_
=
__
t
y
t
cos(
j
t)
_
2
+
_
t
y
t
sin(
j
t)
_
2
_
=
T
2
_
2
j
+
2
j
_
,
1

where
(6)
j
=
1
T
t
y
t
cos
j
t and
j
=
1
T
t
y
t
sin
j
t.
As we have dened it above, the periodogram has just n ordinates which cor-
respond to the values
(7)
j
= 0, 2
T
, . . . ,
(T 1)
T
when T is odd, or
j
= 0, 2
T
, . . . , when T is even.
Although this method of estimating the spectrum via the periodogram
may result, in some cases, in unbiased estimates of the corresponding ordinates
of the spectral density function, it does not result in consistent estimates. This
is hardly suprising when we recall that, in the case where T is even, the Fourier
decomposition of the sample y
0
, . . . , y
T1
, upon which the method is directly
based, requires us to determine the T coecients
0
, (
1
,
1
), . . . , (
n1
,
n1
),
n
,
where n = T/2, from a total of T observations. For a set of parameters to be
estimated consistently, we require that the amount of the relevant information
which is available should increase with the size of the sample; and this cannot
happpen in the present case.
These conclusions can be illustrated quite simply in the case where y(t) =
(t) is a white-noise sequence with a uniform spectrum f() =
2
/2 over the
range { }. The values of
j
and
j
which characterize the sample
spectrum and the periodogram are precisely the ones which would result from
tting the regression model
(8) y(t) =
j
cos(
j
t) +
j
sin(
j
t) + (t)
to the to the data y
0
, . . . , y
T1
. From the ordinary theory of linear regression,
it follows that, if the population values which are estimated by
j
and
j
are
in fact zero, which they must be on the assumption that y(t) = (t), then
(9)
1
2
_
2
j
t
cos
2
(
j
t) +
2
j
t
sin
2
(
j
t)
_
=
T
2
2
_
2
j
+
2
j
_
=
I
j
2
has a chi-square distribution of two degrees of freedom. The variance of a
chi-square distribution of k degrees of freedom is just 2k. Thus we nd that
2

D.S.G. POLLOCK : SPECTRAL ESTIMATION
V (I
j
/
2
) = 4; whence it follows that the variance of the spectral estimate
f
r
(
j
) = I
j
/4 is
(10) V {f
r
(
j
)} =

4
4
2
= f
2
(
j
).
Clearly, this value does not diminish as T increases.
A further consequence of using the periodogram directly to estimate the
spectrum is that the estimators of f(
j
) and f(
k
) will be uncorrelated for all
j = k. This follows from the orthogonality of the sine and cosine functions
which serve as a basis for the Fourier decomposition of the sample. The fact
that adjacent values of the estimated spectrum are uncorrelated means that it
will have a particularly volatile appearance.
Spectrum Averaging
One way of improving the properties of the estimate of f(
j
) is to comprise
within the estimator several adjacent values from the periodogram. Thus we
may dene a new estimator in the form of
(11) f
s
(
j
) =
k=m
k=m
k
f
r
(
jk
).
In addition to the value of the periodogram at the point
j
, this comprises
a further m adjacent values falling on either side. The set of weights
{
m
,
1m
, . . . ,
m1
,
m
}
should sum to unity as well being symmetric in the sense that
k
=
k
. They
dene what is known as a spectral window. Some obvious problems arise in
dening values of the estimate towards the boundaries of the set of frequencies
{
j
; 0
j
}. These problems can be overcome by treating the spectrum
as symmetric about the points 0 and so that, for example, we dene
(12) f
s
() =
0
f
r
() + 2
m
k=1
k
f
r
(
k
).
The estimate f
s
(
j
) comprises a total of M = 2m + 1 ordinates of the
periodogram which span an interval of Q = 4m/T radians. This number of
radians Q is the so-called bandwidth of the estimator. If Q is kept constant,
then M increases at the same rate as T. This means that, in spite of the
increasing sample size, we are denied the advantage of increasing the acuity
or resolution of our estimation; so that narrow peaks in the spectrum, which
3

have been smoothed over, may escape detection. Conversely, if we maintain
the value of M, then the size of the bandwith will decrease with T, and we
may retain some of the disadvantages of the original periodogram. Ideally, we
should allow M to increase at a slower rate than T so that, as M , we will
have Q 0.
Weighting in the Time Domain
An alternative approach to spectral estimation is to give dierential
weighting to the estimated autocovariances comprised in our formula for the
sample spectrum, so that diminishing weights are given to the values of c
as
increases. This seems reasonable since the precision of these estimates decreases
as increases. If the series of weights associated with the the autocovariances
c
0
, c
1
, . . . , c
T1
are denoted by m
0
, m
1
, . . . , m
T1
, then our revised estimator
for the spectrum takes the form of
(13) f
w
() =
1
2
_
m
0
c
0
+ 2
T1
=1
m
cos()
_
.
The series of weights dene what is described as a lag window. If the weights
are zero-valued beyond m
R
, then we describe R as the truncation point.
A wide variety of lag windows have been dened. Amongst those which
are used nowadays are the TukeyHanning window dened by
(14) m
=
1
2
_
1 + cos
_
R
_
_
; = 0, 1, . . . , R
and the Parzen window dened by
(15)
m
= 1 6
_
R
_
2
+ 6
_
R
_
3
; 0
1
2
R,
m
= 2
_
1

R
_
3
;
1
2
R R.
The Relationship between Smoothing and Weighting
It would be suprising if we were unable to interpret the method of smooth-
ing the periodogram in terms of an equivalent method of weighting the auto-
covariance function and vice versa.
Consider the smoothed periodogram dened by
(16) f
s
(
j
) =
m
k=m
k
f
r
(
jk
).
4

D.S.G. POLLOCK : SPECTRAL ESTIMATION
Given that the ordinates of the original periodogram I(
j
) corrrespond to the
points
j
dened in (7), it follows that f
r
(
jk
) = f
r
(
j

k
), where
k
=
2k/T. Therefore, on substituting
(17)
f
r
(
jk
) =
1
2
T1
=1T
c
exp(i
jk
)
=
1
2
exp(i[
j

k
])
into (16), we get
(18)
f
s
(
j
) =
k
_
1
2
exp(i[
j

k
])
_
=
1
2
k
exp(i
k
)
_
c
exp(i
j
)
=
1
2
exp(i
j
)
where
(19) m
=
m
k=m
k
e
i
k
;
k
=
2k
T
is the nite Fourier transform of the sequence of weights
{
m
,
1m
, . . . ,
m1
,
m
}
which dene the spectral window.
The nal expression under (18) would be the same as our expression for
the spectral estimator given under (13) were it not for the fact that we have
dened the present function over the set of values {
j
; j = 1, . . . , n} instead
of over the interval = [0, ], and for the fact that we have used a complex
exponential expression instead of a cosine.
It is also possible to demonstrate an inverse relationship whereby a spec-
tral estimator which depends upon weighting the autocovariance function is
equivalent to another estimator which smooths the periodogram. Consider a
spectral estimator in the form of
(20) f
w
(
0
) =
1
2
T1
=1T
m
exp(i
0
).
5

where
(21) m
=
_
u()e
i
d
has an inverse Fourier transform given by
(22) u() =
1
2
=
m
e
i
On substituting the expression for m
from (21) into (20), we get

(23)
f
w
(
0
) =
1
2
__
u()e
i
d
_
c
e
i
0
=
_
u()
_

e
i(
0
)
_
d
=
_
u()f
r
(
0
)d.
This shows that the technique of weighting the autocovariance function cor-
responds, in general, to a technique of smoothing the periodogram. However,
to sustain this interpretation, we must dene the periodogram not just at n
frequency points {
j
; j = 1, . . . , n}, as we have done in (5), but over the entire
interval [, ]. Notice that, on setting = 0 in (21), we get
(24) m
0
=
_
u()d
It is desirable that the weighting function should integrate to unity over the
relevant range, and this requires us to set m
0
= 1. The latter is exactly the
value by which we would expect to weight the estimated variance c
0
within the
formula in (13) which denes the spectral estimator f
w
().
6

LECTURE 10
Seasonal Models and
Seasonal Adjustment
So far we have relied upon the method of trigonometrical regression for
building models which can be used for forecasting seasonal economic time series.
It has proved necessary, invariably, to perform the preliminary task of elimi-
nating a trend from the data before determining the seasonal pattern from the
residuals. In most of the cases which we have analysed, the trend has been
modelled quite successfully by a simple analytic function such as a quadratic.
However, it is not always possible to nd an analytic function which serves the
purpose. In some cases a stochastic trend seems to be more appropriate. Such
a trend is generated by an autoregressive operator with units roots. Once a
stochastic unit-root model has been adopted for the trend, it seems natural
to model the pattern of seasonal uctuations in the same manner by using
autoregressive operators with complex-valued roots of unit modulus.
The General Multiplicative Seasonal Model
Let
(1) z(t) =
d
y(t)
be a de-trended series which exhibits seasonal behaviour with a periodicity of s
periods. Imagine, for the sake of argument, that the period between successive
observations is one month, which means that the seasons have a cycle of s = 12
months. Once the trend has been extracted from the original series y(t) by
dierencing, we would expect to nd a strong relationship between the values
of observations taken in the same month of successive years. In the simplest
circumstances, we might nd that the dierence between y
t
and y
t12
is a small
random quantity. If the sequence of the twelve-period dierences were white
noise, then we should have a relationship of the form
(2) z(t) = z(t 12) + (t) or, equivalently,
12
y(t) = (t).
This is ostensibly an autoregressive model with an operator in the form of
12
= 1 L
12
. However, it is interesting to note in passing that, if y(t) were
1

D.S.G. POLLOCK: TIME SERIES AND FORECASTING
generated by a regression model in the form of
(3) y(t) =
6
j=0
j
cos(
j

j
) + (t),
where
j
= j/6 = j 30
, then we should have

(4) (1 L
12
)y(t) = (t) (t 12) = (t);
and, if the disturbance sequence (t) were white noise, then the residual term
(t) = (t) (t 12) would show the following pattern of correlation:
(5) C(
t
,
tj
) =
2
, if j mod 12 = 0;
0, otherwise.
It can be imagined that a more complicated relationship stretches over the
years which connects the months of the calender. By a simple analogy with the
ordinary ARMA model, we can devise a model of the form
(6) (L
12
)
D
12
z(t) = (L
12
)(t),
where (z) is a polynomial of degree P and (z) is a polynomial of degree
Q. In eect, this model is applied to twelve separate time seriesone for each
month of the yearwhose observations are seperated by yearly intervals. If
(t) were a white-noise sequence of independently and identically distributed
random variables, then there would be no connection between the twelve time
series.
If there is a connection between successive months within the year, then
there should be a pattern of serial correlation amongst the elements of the
disturbance process (t). One might propose to model this pattern using a
second ARMA of the form
(7) (L)(t) = (L)(t),
where (z) is a polynomial of degree p and (z) is a polynomial of degree q.
The various components of our analysis can now be assembled. By com-
bining equations (1) (6) and (7), we can derive the following general model for
the sequence y(t):
(8) (L
12
)(L)
D
12
d
y(t) = (L
12
)(L)(t).
A model of this sort has been described by Box and Jenkins as the general
multiplicative seasonal model. To denote such a model in a summary fashion,
2

D.S.G. POLLOCK : SEASONALITY
they describe it as an ARIMA (P, D, Q) (p, d, q) model. Although, in the
general version of the model, the seasonal dierence operator
12
is raised to
the power D; it is unusual to nd values other that D = 0, 1.
Factorisation of The Seasonal Dierence Operator
The equation under (8) should be regarded as a portmanteau in which
a collection of simplied models can be placed. The profusion of symbols in
equation (8) tends to suggest a model which is too complicated to be of practical
use. Moreover, even with
12
in place of
D
12
, there is a redundancy in the
notation to we should draw attention. This redundancy arises from the fact that
the seasonal dierence operator
D
12
already contains the operator = I L as
one of its factors. Therefore, unless this factor is eliminated, there is a danger
that the original sequence y(t) will be subjected, inadvertently, to one more
dierencing operation than is intended.
The twelve factors of the operator
D
12
= I L
12
contain the so-called
twelfth-order roots of unity which are the solutions of the algebraic equation
1 = z
12
. The factorisation may be demonstrated in three stages. To begin, it
is easy to see that
(9)
I L
12
= (I L)(I + L + L
2
+ + L
11
)
= (I L)(I + L
2
+ L
4
+ + L
10
)(I + L).
The next step is to recognise that
(10)
(I + L
2
+ L
4
+ + L
10
)
= (1
3L + L
2
)(I L + L
2
)(I + L
2
)(I + L + L
2
)(1 +
3L + L
2
).
Finally, it can be see that the generic quadratic factor has the form of
(11) 1 2 cos(
j
)L + L
2
= (1 e
i
j
L)(1 e
i
j
L).
where
j
= j/6 = j 30
.
Figure 1 shows the disposition of the twelfth roots of unity around the unit
circle in the complex plane.
A cursory inspection of equation (9) indicates that the rst-order dierence
operator = I L is indeed one of the factors of
12
= I L
12
. Therefore, if
the sequence y(t) has been reduced to stationarity already by the application
of d rst-order dierencing operations, then its subsequent dierencing via the
operator
12
is unnecessary and is liable to destroy some of the characteristics
of the sequence which ought to be captured by the ARIMA model.
The factorisation of the seasonal dierence operator also helps to explain
how the seasonal ARMA model can give rise to seemingly regular cycles of the
appropriate duration.
3

i
i
1 1
Re
Im
Figure 1. The 12th roots of unity inscribed in the unit circle.
Consider a simple second-order autoregressive model with complex-valued
roots of unit modulus:
(12)
I 2 cos(
j
)L + L
2
y
j
(t) =
j
(t).
Such a model can gives rise to quite regular cycles whose average duration is
2/
j
periods. The graph of the sequence generated by a model with
j
=
1
= /6 = 30
is given in Figure 2. Now consider generating the full set of

stochastic sequences y
j
(t) for j = 1, . . . , 5. Also included in this set should be
the sequences y
0
(t) and y
6
(t) generated by the rst-order equations
(13) (I L)y
0
(t) =
0
(t) and (I + L)y
6
(t) =
6
(t).
These sequences, which resemble trigonometrical functions, will be harmoni-
cally related in the manner of the trigonometrical functions comprised by equa-
tion (3) which also provides a model for a seasonal time series. It follows that
a good representation of a seasonal economic time series can be obtained by
taking a weighted combination of the stochastic sequences.
For simplicity, imagine that the white-noise sequences
j
(t); j = 0, . . . , 6
are mutually independent and that their variances can take a variety of values.
Then the sum of the stochastic sequences will be given by
(14)
y(t) =
6
j=0
y
j
(t)
=

0
(t)
I L
+
5
j=1
j
(t)
I 2 cos(
j
)L + L
2
+

6
(t)
I L
.
4

0
5
10
15
20
25
0
5
10
15
20
25
0 20 40 60 80
Figure 2. The graph of 84 observations on a simulated series
generated by the AR(2) process (1 1.732L + L
2
)y(t) = (t).
The terms on the RHS of this expression can be combined. Their common
denominator is simply the operator
12
= I L
12
. The numerator is a sum
of 7 mutually independent moving-average process, each with an order of 10
or 11. This also amounts to an MA(11) process which can be denoted by
(t) = (L)(t). Thus the combination of the harmonically related unit-root
AR(2) processes gives rise to a seasonal process in the form of
(15)
y(t) =
(L)
I L
12
(t) or, equivalently,
12
y(t) = (L)(t).
The equation of this model is contained within the portmanteau equation of the
general multiplicative model given under (8). However, although it represents
a simplication of the general model, it still contains a number of parameters
which is liable to prove excessive. A typical model, which contain only a few
parameter, is the ARIMA (0, 1, 1)(0, 1, 1) model which Box and Jenkins tted
to the logarithms of the AIRPASS data. The AIRPASS model takes the form of
(16) (I L
12
)(I L)y(t) = (1 L
12
)(1 L)(t).
Notice how the unit-root autoregressive operators I L
12
and I L are coupled
with the moving-average operators I L
12
and I L respectively. These
serve to enhance the regularity of the stochastic cycles and to smooth the trend.
5

Forecasting with Unit-Root Seasonal Models
Although their appearances are supercially similar, the seasonal economic
series and the series generated by equations such as (16) are, fundamentally,
of very dierent natures. In the case of the series generated by a unit-root
stochastic dierence equation, there is no bound, in the long run, on the ampli-
tude of the cycles. Also there is a tendency for the phases of the cycles to drift
without limit. If the latter were a feature of the monthly time series of con-
sumer expenditures, for example, then we could not expect the annual boom
in sales to occur at a denite time of the year. In fact, it occurs invariably at
Christmas time.
The advantage of unit-root seasonal models does not lie in the realism with
which they describe the processes which generate the economic data series.
For that purpose the trigonometrical model seems more appropriate. Their
advantage lies, instead, in their ability to forecast the seasonal series.
The simplest of the seasonal unit-root models is the one which is specied
by equation (2). This is a twelfth-order dierence equation with a white-noise
forcing function. In generating forecasts from the model, we need only replace
the elements of (t) which lie in the future by their zero-valued expectations.
Then the forecasts may be obtained iteratively from a homogeneous dierence
equation in which the initial conditions are simply the values of y(t) observed
over the preceding twelve months. In eect, we observe the most recent annual
cycle and we extrapolate its form exactly year-in year-out into the indenite
future.
A somewhat dierent forecasting rule is associated with the model dened
by the equation
(17) (I L
12
)y(t) = (1 L
12
)(t)
This equation is analogous to the simple IMA(1, 1) equation in the form of
(18) (I L)y(t) = (1 L)(t)
which was considered at the beginning of the course. The later equation was
obtained by combining a rst-order random walk with a white-noise error of
observation. The two equations, whose combination gives rise to (18), are
(19)
(t) = (t 1) + (t),
y(t) = (t) + (t),
wherein (t) and (t) are generated by two mutually independent white-noise
processes.
6

0
5
0
5
10
15
0 20 40 60
Figure 3. The sample trajectory and the forecast function of
the nonstationary 12th-order process y(t) = y(t 12) + (t).
Equation (17), which represents the seasonal model which was used by Box
and Jenkins, is generated by combining the following the equations which are
analogous to these under (19):
(20)
(t) = (t 12) + (t),
y(t) = (t) + (t).
Here (t) and (t) continue to represent a pair of independent white-noise
processes.
The procedure for forecasting the IMA model consisted of extrapolating
into the indenite future a constant value y
t+1|t
which represents the one-
step-ahead forecast made at time t. The forecast itself was obtained from
a geometrically-weighted combination of all past values of the sequence y(t)
which represent erroneous observations on the random-walk process (t). The
forecasts for the seasonal model of (17) are obtained by extrapolating a so-called
annual reference cycle into the future so that it applies in every successive
year. The reference cycle is constructed by taking a geometrically weighted
combination of all past annual cycles. The analogy with the IMA model is
perfect!
It is interesting to compare the forecast function of a stochastic unit-root
seasonal model of (17) with the forecast function of the corresponding trigono-
metrical model represented by (3). In the latter case, the forecast function
7

depends upon a reference cycle which is the average of all of the annual cycles
which are represented by the data set from which the regression parameters
have been computed. The stochastic model seems to have the advantage that,
in forming its average of previous annual cycles, it gives more weight to recent
years. However, it is not dicult to contrive a regression model which has the
same feature.
8

A Short Course of Time-Series Analysis and Forecasting by D S G Pollock

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Short Course of Time-Series Analysis and Forecasting by D S G Pollock

Uploaded by

Copyright:

Available Formats

A SHORT COURSE OF

TIME-SERIES ANALYSIS AND FORECASTING

. The value of the asset at time t

are observationally indistinguishable. Here,

[0, ] is described as the alias of > .

(y y). Therefore we can rewrite the equation as

= i such that the product (z)(z

which can be determined once we are

are conjugate complex numbers, then so too are ,

must be conjugate numbers of the form

, this amplitude reects the initial conditions. The phase angle

= i, then this condition

Figure 4.The phase diagram of the transfer function (1 + 2L

. The autocorrelation at lag , denoted by

for all values of .

} are described as the autocovariance and autocor-

= V {(t)}, the autocovariances of the ltered se-

are observationally indistinguishable. Here,

< is described as the alias of > .

. This serves to show

, the expression for

= 1. The equation can be written in summary notation

= 0 for all > q.

of T consecutive elements from a rst-

for successive values

; with = {0, 1, 2, . . .} and

= 1 instead. The equation can be written in summary

of T consecutive elements from a rst-order

= 1 might have been chosen instead

= 1. A more summary expression for the equation is (L)y(t) = (L)(t).

are known. Notice that, when we adopt the

. When the equations (43) are

= 0 which is to say that the correlation

= V {(t)} is the variance of the noise.

; = 0, 1, 2, . . .}. An analogous relationship holds between

from the sample autocovariance or empirical autocovariance of lag :

}, we shall omit the value of r

at lag is simply the correlation between the two sets of residuals

is virtually the same quantity as

} of partial autocorrelations from the

} of autocovariances. It can be seen, in view of this algorithm,

} is equivalent to the information contained jointly

= 0 for all > q. The

} is liable to decay towards

} shows evidence of a truncation, we need some scale by which to

T which are the

= 0 for all > p. Likewise, all elements of the

T are also plotted on the graph of the partial autocorrela-

} by the corresponding empirical moments {c

= 0 when > T 1. Thus we obtain a sample spectrum in the form of

On substituting the expression for m

from (21) into (20), we get

, then we should have

is given in Figure 2. Now consider generating the full set of

You might also like