Professional Documents
Culture Documents
t
p
"
k1
, 1
where o !0, b > 0, a 2 R (which we may assume is non-
negative without any loss of generality), and f"
k
g is iid
Gaussian N0, 1. Clearly, we assume that "
k1
is inde-
pendent of x
0
, x
1
, . . . , x
k
. The process mean reverts to
j a=b with strength b. Clearly,
x
k
$ Nj
k
, o
2
k
, 2
*Corresponding author. Email: Jvanderh@maths.adelaide.
edu.au
Quantitative Finance
ISSN 14697688 print/ISSN 14697696 online # 2005 Taylor & Francis
http://www.tandf.co.uk/journals
DOI: 10.1080/14697680500149370
D
o
w
n
l
o
a
d
e
d
b
y
[
C
h
u
l
a
l
o
n
g
k
o
r
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
0
:
2
6
2
5
J
u
l
y
2
0
1
1
where
j
k
a
b
j
0
a
b
h i
1 bt
k
, 3
and
o
2
k
o
2
t
1 1 bt
2
1 1 bt
2k
h i
o
2
0
1 bt
2k
: 4
It is easy to show that
j
k
!
a
b
as k ! 1, 5
and
o
2
k
!
o
2
t
1 1 bt
2
as k ! 1, 6
provided we have chosen t >0 and small so that
j1 btj < 1.
We can also write (1) as
x
k1
A Bx
k
C"
k1
, 7
with A at ! 0, 0 < B 1 bt < 1 and C o
t
p
. We
could also regard x
k
Xkt where fXt j t ! 0g satises
the stochastic dierential equation
dXt a b Xt dt o dWt, 8
where fWt j t ! 0g is a standard Brownian motion (on
some probability space).
2.2. The observation process
We assume that we have an observation process {y
k
} of
{x
k
} in Gaussian noise:
y
k
x
k
D!
k
, 9
where f!
k
g are iid Gaussian N0, 1 and independent of
the f"
k
g in (1) and D > 0. We may assume that
0 C < D, which should be the case for small values of t.
We set Y
k
ofy
0
, y
1
, . . . , y
k
g which represents the
information from observing y
0
, y
1
, . . . , y
k
. We will wish
to compute the conditional expectation (lter):
^ xx
k
Ex
k
j Y
k
, 10
which are best estimates of the hidden state process
through the observed process. In order to make the
estimate (10), we will need to estimate A, B, C, D or
rather A, B, C
2
, D
2
from the observed data. We shall
present various results for this below.
2.3. The application
We shall regard {y
k
} as a model for the observed spread
of two securities at time t
k
. We assume the observed
spread is a noisy observation of some mean-reverting
state process {x
k
}. The {y
k
} could also model the returns
of the spread portfolio as is often done in practice.
If y
k
> ^ xx
kjk1
Ex
k
jY
k1
the spread is regarded as
too large, and so the trader could take a long position
in the spread portfolio and prot when a correction
occurs. An alternative would be to initiate a long trade
only when y
k
exceeds ^ xx
kjk1
by some threshold value. A
corresponding short trade could be entered when
y
k
< ^ xx
kjk1
.
Various decisions have to be made by the trader. What
is a suitable pair of securities for pair trading? If our
estimates for B reveal 0 < B < 1, then this is consistent
with the mean-reverting model we have described.
Comparing y
k
and ^ xx
kjk1
may or may not lead to a
trade if thresholds must be met. How are thresholds set?
See Vidyamurthy (2004) for some possibilities. When is
the pairs trade unwound? There are various possibilities:
the next trading time (see Reverre (2001) example) or
when the spread corrects suciently. The price applica-
tion and data bases used are often proprietary in industry
applications. The machinery we present then provides
some useful tools appropriate for pairs trading.
Another explicit strategy could make use of First-
Passage Times results (see Finch (2004) and references
cited therein) for the (standardized) OrnsteinUhlenbeck
process
dZt Zt dt
2
p
dWt: 11
Let
T
0,c
infft ! 0, Zt 0 j Z0 cg, 12
which has a probability density function f
0,c
. It is known
explicitly that
f
0,c
t
2
p
r
jcj e
t
1 e
2t
3=2
exp
c
2
e
2t
21 e
2t
!
13
for t >0. Now f
0,c
has a maximum value at
^
tt given by
^
tt
1
2
ln 1
1
2
c
2
3
2
4c
2
q
c
2
3
!
: 14
We can also write (8) in the form
dXt aXt j dt o dWt, 15
where a b and j a=b. When
X0 j c
o
2a
p , 16
the most likely time T at which XT j is given by
T
1
a
^
tt, 17
where
^
tt is given by (14). Use the OrnsteinUhlenbeck
process as an approximation to (7) with a A=t,
b 1 B=t and o C=
t
p
with the calibrated values
A, B, C. Choose a value of c >0. Enter a pair trade
when y
k
! j co=
2a
p
and unwind the trade at time
T later. A corresponding pair trade would be performed
when y
k
j co=
2a
p
and unwound at time T later.
Other methods based on up- and down-crossing results
for AR(1) processes could also be considered. Corre-
sponding results like those for the OrnsteinUhlenbeck
processes are not known.
272 R. J. Elliott et al.
D
o
w
n
l
o
a
d
e
d
b
y
[
C
h
u
l
a
l
o
n
g
k
o
r
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
0
:
2
6
2
5
J
u
l
y
2
0
1
1
3. Filtering and estimation results
We assume some underlying probability space , F, P
whose details need not concern the trader, except that P
represents the real world probability.
3.1. Kalman ltering
We have a state equation
x
k1
A Bx
k
C"
k1
, 18
and the observation equation
y
k
x
k
D!
k
, 19
for k 0, 1, 2, . . . .
Given A, B, C, D we can compute
j
k
^ xx
k
^ xx
kjk
Ex
k
jY
k
20
using the Kalman Filter (see Elliott et al. (1995) for a
reference probability style proof ). Let
R
k
kjk
E x
k
^ xx
k
Y
k
: 21
Then ^ xx
k
, R
k
are determine recursively as follows:
^ xx
k1jk
A Bj
k
A B^ xx
kjk
, 22
k1jk
B
2
kjk
C
2
, 23
K
k1
k1jk
=
k1jk
D
2
, 24
^ xx
k1
^ xx
k1jk1
^ xx
k1jk
K
k1
y
k1
^ xx
k1jk
, 25
R
k1
k1jk1
D
2
K
k1
k1jk
K
k1
k1jk
: 26
For initialization we could take ^ xx
0
y
0
and R
0
D
2
.
Remark: As k !1, R
k
converges (monotonically) to
the positive root R of B
2
R
2
C
2
D
2
B
2
D
2
R
C
2
D
2
0 provided B
2
6 0, C
2
D
2
6 0. We cannot say
very much about limiting values of ^ xx
k
except it is
exponentially forgetting of ^ xx
0
. However, these comments
are not very important as we will only assume the
model (18) holds over a short time horizon for a given
set of values on A, B, C, D.
3.2. Estimation of model
We now provide estimates for # A, B, C
2
, D
2
based on
observations y
0
, y
1
, . . . , y
N
. We use the EM-Algorithm to
nd
^
## by an iteration that provides a stationary value of
the likelihood function based on the observations. In fact,
let (see Elliott et al. (1995))
L
N
# E
0
dP
#
dP
0
Y
N
!
27
be the likelihood function for #2. The maximum
likelihood estimate solves
^
## arg max
#2
L
N
#: 28
The EM-Algorithm is an iterative method to compute
^
##.
If
^
##
0
is an initial estimate, the EM-Algorithm provides
^
##
j
, j 1, 2, . . . , as a sequence of estimates.
Step 1 (the E-step): Compute (with
~
##
^
##
j
)
Q#,
~
## E
~
##
log
dP
#
dP
~
##
Y
N
!
: 29
Step 2 (the M-step): Find
#
j1
2 arg max
#2
Q#,
^
##
j
: 30
In the literature there are basically two procedures to
implement the EM-Algorithm.
3.2.1. Shumway and Stoer (1982) smoother
approach. This method is described by Shumway and
Stoer (1982, 2000) and is an o-line calculation and
makes use of smoother estimators for the Kalman Filter.
We dene smoothers (for k N ):
^ xx
kjN
Ex
k
jY
N
, 31
kjN
E x
k
^ xx
kjN
Y
N
E x
k
^ xx
kjN
2
, 32
k1, kjN
E x
k
^ xx
kjN
x
k1
^ xx
k1jN
: 33
These smoothers can be computed by
J
k
B
kjk
k1jk
, 34
^ xx
kjN
^ xx
kjk
J
k
^ xx
k1jN
A B^ xx
kjk
, 35
kjN
kjk
J
2
k
k1jN
k1jk
, 36
k1, kjN
J
k1
kjk
J
k
J
k1
k, k1jN
B
kjk
, 37
N1, NjN
B1 K
N
N1jN1
, 38
where initial values for this backward recursion ^ xx
NjN
and
NjN
are obtained from the Kalman Filter along with
other estimates. Given #
j
A, B, C
2
, D
2
and initial
values for the Kalman Filter ^ xx
0
j1
^ xx
0jN
and
0j0
j1
0jN
which are the smoothers from the
previous step ( j 1). The updates #
j1
^
AA,
^
BB,
^
CC
2
,
^
DD
2
1
N
X
N
k1
x
k
^
AA
^
BBx
k1
Y
N
!
, 41
^
DD
2
1
N 1
X
N
k0
y
k
x
k
Y
N
, 42
where
o
X
N
k1
E x
2
k1
Y
N
X
N
k1
k1jN
^ xx
2
k1jN
h i
,
[
X
N
k1
Ex
k1
x
k
jY
N
X
N
k1
k1, kjN
^ xx
k1jN
^ xx
kjN
,
,
X
N
k1
^ xx
kjN
,
o
X
N
k1
^ xx
k1jN
, ^ xx
NjN
^ xx
0jN
,
Pairs trading 273
D
o
w
n
l
o
a
d
e
d
b
y
[
C
h
u
l
a
l
o
n
g
k
o
r
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
0
:
2
6
2
5
J
u
l
y
2
0
1
1
and the right-hand sides of (41) and (42) are readily
computed in terms of smoothers:
^
CC
2
1
N
X
N
k1
kjN
^ xx
2
kjN
^
AA
2
^
BB
2
k1jN
^
BB
2
^ xx
k1jN
2
2
^
AA^ xx
kjN
2
^
AA
^
BB^ xx
k1jN
2
^
BB
k1, kjN
2
^
BB^ xx
kjN
^ xx
k1jN
!
,
^
DD
2
1
N 1
X
N
k0
y
2
k
2y
k
^ xx
kjN
kjN
^ xx
2
kjN
h i
:
The disadvantage of this algorithm is that, as new values
of observations are given, the whole algorithm must be
repeated o-line. However, if we have written a code for
this estimation based on N1 observations y
0
, y
1
, . . . , y
N
,
then with y
N1
we simply provide the code with input
y
1
, y
2
, . . . , y
N1
. The Shumway and Stoer algorithm
has been widely tested.
3.2.2. Elliott and Krishnamurthy (1999) lter
approach. This approach to the implementation of the
EM-Algorithm uses ltered quantities and can be
performed on-line. This was based on a new class of
nite-dimensional recursive lters for linear dynamic
systems, which can be adapted to equations (18) and (19).
The important advantages of this lter-based
EM-Algorithm compared with the (standard) smoother
based EM-Algorithm include (i) substantially reduced
memory requirements, and (ii) ease of parallel implemen-
tation on a multiprocessor system (see Elliott and
Krishnamurthy (1997, 1999)). The details of this approach
are discussed in Elliott et al. (in press), where computa-
tional issues and convergence are reported.
As in section 3.2.1, we start with
^
##
j
A, B, C
2
, D
2
and
initial values for the Kalman Filter and the next estimate
^
##
j1
^
AA,
^
BB,
^
CC
2
,
^
DD
2
.
We introduce various quantities:
H
0
k
X
k
l0
x
2
l
,
d
H
0
k
H
0
k
E H
0
k
Y
k
,
H
1
k
X
k
l1
x
l
x
l1
,
d
H
1
k
H
1
k
E H
1
k
Y
k
,
H
2
k
X
k
l0
x
2
l1
,
d
H
2
k
H
2
k
E H
2
k
Y
k
,
J
k
X
k
l0
x
l
y
l
,
b
J
k
J
k
E J
k
Y
k
,
I
0
k
X
k
l0
x
l
,
b
I
0
k
I
0
k
E I
0
k
Y
k
,
I
1
k
X
k
l0
x
l1
,
b
I
1
k
I
1
k
E I
1
k
Y
k
,
Y
k
X
k
l0
y
2
l
:
If E E
^
##
j
, which means using
^
##
j
A, B, C
2
, D
2
in
the dynamics (18), (19), then
^
##
j1
^
AA,
^
BB,
^
CC
2
,
^
DD
2
is
given through
^
AA 1
c
I
1
N
I
1
N
2
d
H
2
N
H
2
N
" #
1
c
I
0
N
I
0
N
d
H
1
N
H
1
N
c
I
1
N
I
1
N
d
H
2
N
H
2
N
" #
, 43
^
BB
1
d
H
2
N
H
2
N
d
H
1
N
H
1
N
^
AA
c
I
1
N
I
1
N
h i
, 44
^
CC
2
1
T
d
H
0
N
H
0
N
T
^
AA
2
d
H
2
N
H
2
N
^
BB
2
2
^
AA
c
I
0
N
I
0
N
2
^
AA
^
BB
c
I
1
N
I
1
N
2
^
BB
d
H
1
N
H
1
N
h i
,
45
^
DD
2
1
T 1
Y
N
2
c
J
N
J
N
d
H
0
N
H
0
N
h i
: 46
We now provide recurrences for computing the quantities
in (43)(46). Given #
j
we use the Kalman Filter calcula-
tions (22)(26) to determine the values of j
k
and R
k
, from
which we have (M 0, 1, 2)
d
H
M
k
H
M
k
a
M
k
b
M
k
j
k
d
M
k
R
k
j
2
k
, 47
b
J
k
J
k
" aa
k
"
bb
k
j
k
, 48
b
I
0
k
I
0
k
s
0
k
t
0
k
j
k
, 49
b
I
1
k
I
1
k
s
1
k
t
1
k
j
k
, 50
Y
k
Y
k1
y
2
k
, 51
where the various coecients are determined as follows.
Set
s
k
1
R
k
B
2
C
2
, 52
k
1
s
k
B
2
C
2
, 53
S
k
1
s
k
j
k
R
k
AB
C
2
!
, 54
then
a
0
0
0, b
0
0
0, d
0
0
1,
a
0
k1
a
0
k
b
0
k
S
k
d
0
k
S
2
k
s
1
k
,
b
0
k1
b
0
k
k
S
k
2d
0
k
k
S
k
,
d
0
k1
1 d
0
k
2
k
, 55
a
1
0
0, b
1
0
0, d
1
0
0,
a
1
k1
a
1
k
b
1
k
S
k
d
1
k
S
2
k
s
1
k
,
b
1
k1
b
1
k
k
S
k
2d
1
k
k
S
k
,
d
1
k1
k
d
1
k
2
k
, 56
a
2
0
0, b
2
0
0, d
2
0
0,
a
2
k1
a
2
k
b
2
k
S
k
d
2
k
1S
2
k
s
1
k
,
b
2
k1
b
2
k
k
2d
2
k
1
k
S
k
,
d
2
k1
1 d
2
k
2
k
, 57
274 R. J. Elliott et al.
D
o
w
n
l
o
a
d
e
d
b
y
[
C
h
u
l
a
l
o
n
g
k
o
r
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
0
:
2
6
2
5
J
u
l
y
2
0
1
1
where programmers are warned to distinguish here
between superscript 2 and squared terms
" aa
0
0,
"
bb
0
y
0
,
" aa
k1
" aa
k
"
bb
k
S
k
,
"
bb
k1
y
k1
"
bb
k
k
, 58
s
0
0
0, t
0
0
1,
s
0
k1
s
0
k
t
0
k
S
k
,
t
0
k1
1 t
0
k
k
, 59
s
1
0
0, t
1
0
0,
s
1
k1
s
1
k
t
1
k
S
k
,
t
1
k1
1 t
1
k
k
: 60
Remarks: (a) Given
^
##
j
the calculation of
^
##
j1
is com-
puted by the steps: initialize the Kalman Filter with
j
0
y
0
and R
0
D
2
. If the j
k
and R
k
have been
calculated, the various coecients may now be calculated
using (52)(54) and then (55)(60). Find j
k1
and R
k1
from the Kalman Filter equations. Continue until k N.
Then compute the quantities in (47)(51) (at k N) and
then
^
##
j1
from (43)(46). Some initial guess for
^
##
0
must
be made, and then iterations are concluded when the
values for
^
##
j
have converged suciently. Call this
^
##(N).
(b) If
^
##N
^
AA,
^
BB,
^
CC
2
,
^
DD
2
, we should check that
^
AA > 0
and 0 <
^
BB < 1, else the pairs trading algorithm should
not be used with this data. (c) The procedure described
in (a) could be regarded as an initialization, and need not
be repeated in subsequent steps, where only one iteration
should suce to update the coecients in the model.
3.3. Implementation of the EM-Algorithm
We will assume that model (18), (19) holds over N
periods. The values of
^
##(N) and j
N
are computed based
on the observations y
0
, y
1
, . . . , y
N
and a trade may be
initiated as described in section 2, and possibly unwound
at t N 1 (or according to some other criterion). Based
on
^
##(N), j
N1
is computed based on the data
y
1
, y
2
, . . . , y
N1
(the most recent N1 values with the
Kalman Filter initialized at j
1
y
1
and R
1
c
D
2
D
2
N)
and a trade initiated.
^
##N 1 is calculated with one
iteration using section 3.2.1 or 3.2.2 and using the
Kalman Filter based on data y
1
, y
2
, . . . , y
N1
. The
procedure is then repeated.
4. Numerical examples
Here we will provide some simulation and calibration
results which demonstrate that the Shumway and
Stoer algorithm provides a consistent and robust
estimating algorithm for the model. Studies based on
Elliott and Krishnamurthy are given by Elliott et al. (in
press). Some initial experiments have also been performed
with real data with a hedge fund.
To illustrate the typical performance of the Shumway
and Stoer EM algorithm, adapted to estimation of the
set fA, B, C, Dg, we consider a simulation with parameter
values A 0:20, B 0:85, C 0:60 and D 0:80. Our
observation set contained 100 points. To initialize the EM
algorithm, the following values were used: A 1:20,
B 0:50, C 0:30 and D 0:70, with ^ xx
0j0
0 and
0j0
0:1. The EM algorithm was iterated 150 times.
Figures 1 and 2 show convergence of the maximum
likelihood estimates of all parameters.
References
Elliott, R.J., Aggoun, L. and Moore, J.B., Hidden Markov
Models, 1995 (Springer: Berlin).
Elliott, R.J. and Krishnamurthy, V., Exact nite-dimensional
lters for maximum likelihood parameter estimation of
0 50 100 150
0
0.1
0.2
0.3
0.4
0.5
EM Algorithm Pass Index
0 50 100 150
EM Algorithm Pass Index
Estimate of A
0.7
0.75
0.8
0.85
0.9
0.95
Estimate of B
Figure 1. Convergence of the maximum likelihood estimates A
and B.
0 50 100 150
0.2
0.4
0.5
0.6
0.8
EM Algorithm Pass Index
0 50 100 150
EM Algorithm Pass Index
Estimate of C
0.8
0.9
1
1.1
1.2
1.3
Estimate of D
Figure 2. Convergence of the maximum likelihood estimates C
and D.
Pairs trading 275
D
o
w
n
l
o
a
d
e
d
b
y
[
C
h
u
l
a
l
o
n
g
k
o
r
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
0
:
2
6
2
5
J
u
l
y
2
0
1
1
continuous-time linear Gaussian systems. SIAM Journal of
Control and Optimization, 1997, 35, 19081923.
Elliott, R.J. and Krishnamurthy, V., New nite-dimensional
lters for parameter estimation of discrete-time linear
Gaussian models. IEEE Transactions of Automatic Control,
1999, 44, 938951.
Elliott, R.J., Malcolm, W.P. and van der Hoek, J.,
The numerical analysis of a lter based EM algorithm
(in press).
Finch, S., OrnsteinUhlenbeck process. Unpublished Note.
Available online at: http://pauillac.inria.fr/algo/bsolve/
constant/constant.html.
Gatev, E.G., Goetzmann, W.N. and Rouwenhorst, K.G., Pairs
trading: performance of a relative average arbitrage rule.
NBER Working Paper 7032, National Bureau of Economic
Research Inc., 1999. Available online at: http://www.nber.org/
papers/w7032.
Litterman, B., Modern Investment ManagementAn
Equilibrium Approach, chapter 1, 2003 (Wiley: New York).
Nicholas, J.G., Market Neutral Investing Long/Short Hedge
Fund Strategies, 2000 (Bloomberg Professional Library,
Bloomberg Press: Princeton, NJ, USA).
Reverre, S., The Complete Arbitrage Desk-book, chapter 10,
2001 (McGraw Hill: New York).
Shumway, R.H. and Stoer, D.S., An approach to time series
smoothing and forecasting using the EM algorithm. Journal
of Time Series, 1982, 3, 253264.
Shumway, R.H. and Stoer, D.S., Time Series Analysis and Its
Applications, 2000 (Springer: New York).
Vidyamurthy, G., Pairs TradingQuantitative Methods and
Analysis, 2004 (Wiley: New York).
Whistler, M., Trading PairsCapturing Prots and Hedging
Risk with Statistical Arbitrage Strategies, 2004 (Wiley:
New York).
276 R. J. Elliott et al.
D
o
w
n
l
o
a
d
e
d
b
y
[
C
h
u
l
a
l
o
n
g
k
o
r
n
U
n
i
v
e
r
s
i
t
y
]
a
t
0
0
:
2
6
2
5
J
u
l
y
2
0
1
1