You are on page 1of 193

Nonparametric Estimation for

Financial Investment under Log-Utility

Von der Fakultät Mathematik der Universität Stuttgart


zur Erlangung der Würde eines Doktors der
Naturwissenschaften (Dr. rer. nat.) genehmigte Abhandlung

Vorgelegt von

Dominik Schäfer
aus Pforzheim

Hauptberichter: Prof. Dr. H. Walk


Mitberichter: Prof. Dr. V. Claus
Prof. Dr. L. Györfi

Tag der mündlichen Prüfung: 15. Juli 2002

Mathematisches Institut A der Universität Stuttgart

2002
dedicated to

My Parents
to whom I owe so much

Professor Paul Glendinning


without him I might never have found my way
to mathematical finance and economics
1

CONTENTS

Abbreviations 3

Summary 7

Zusammenfassung 19

Acknowledgements 31

1 Introduction: investment and nonparametric statistics 33


1.1 The market model . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.2 Portfolios and investment strategies . . . . . . . . . . . . . . . . 37
1.3 Pleading for logarithmic utility . . . . . . . . . . . . . . . . . . 40

2 Portfolio benchmarking: rates and dimensionality 47


2.1 Rates of convergence in i.i.d. models . . . . . . . . . . . . . . . 48
2.2 Dimensionality in portfolio selection . . . . . . . . . . . . . . . . 61
2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

3 Predicted stock returns and portfolio selection 73


3.1 A strategy using predicted log-returns . . . . . . . . . . . . . . . 74
3.2 Prediction of Gaussian log-returns . . . . . . . . . . . . . . . . . 77
3.2.1 An approximation result . . . . . . . . . . . . . . . . . . 80
3.2.2 An estimation algorithm . . . . . . . . . . . . . . . . . . 81
3.3 Proof of the approximation and estimation results . . . . . . . . 86
3.4 Simulations and examples . . . . . . . . . . . . . . . . . . . . . 97
2

4 A Markov model with transaction costs: probabilistic view 103


4.1 Strategies in markets with transaction fees . . . . . . . . . . . . 104
4.2 An optimal strategy . . . . . . . . . . . . . . . . . . . . . . . . 108
4.2.1 Some comments on Markov control . . . . . . . . . . . . 110
4.2.2 Proof of Theorem 4.2.1 . . . . . . . . . . . . . . . . . . . 111
4.3 Further properties of the value function . . . . . . . . . . . . . . 126

5 A Markov model with transaction costs: statistical view 129


5.1 The empirical Bellman equation . . . . . . . . . . . . . . . . . . 129
5.1.1 An optimal strategy . . . . . . . . . . . . . . . . . . . . 131
5.1.2 How to prove optimality . . . . . . . . . . . . . . . . . . 135
5.2 Uniformly consistent regression estimation . . . . . . . . . . . . 135
5.3 Proving the optimality of the strategy . . . . . . . . . . . . . . . 145

6 Portfolio selection functions in stationary return processes 151


6.1 Portfolio selection functions . . . . . . . . . . . . . . . . . . . . 152
6.2 Estimation of log-optimal portfolio selection functions . . . . . . 155
6.3 Checking the properties of the estimation algorithm . . . . . . . 161
6.3.1 Proof of the convergence Lemma 6.2.1 . . . . . . . . . . 161
6.3.2 Proof of the related Theorems 6.2.2 - 6.2.4 . . . . . . . . 169
6.4 Simulations and examples . . . . . . . . . . . . . . . . . . . . . 175

L’Envoi 180

References 181
3

ABBREVIATIONS

|·| absolute value of a number, cardinality of a set


< ·, · > Euclidean scalar product
k · k∞ supremum norm
k · kq q-norm (on IRd or Lq )
k·k other norm

IN positive integers 1, 2, 3, ...


IN0 nonnegative integers 0, 1, 2, 3, ...
IR real numbers
IR+ real numbers > 0
IR+0 real numbers ≥ 0

bxc integer part of x


bxcN the smallest kN (k ∈ IN) such that kN ≥ x ≥ 0.
dxe x rounded toward infinity

·T transpose of a vector or matrix


spr(·) spectrum of a matrix

exp exponential to the base e


log logarithm to the base e
lb logarithm to the base 2

an = o(bn ) Landau symbol for: an /bn → 0


an = O(bn ) Landau symbol for: an /bn is a bounded sequence

AC complement of the set A


Ā closure of the set A
conv(A) convex hull of the set A
4 Abbreviations

1A characteristic function of the set A


diam(A) Euclidean diameter supa,b∈A ka − bk∞ of the set A
ρ(x, A) Euclidean distance inf a∈A kx − ak∞ from x to the set A
H(A, B) Hausdorff distance max{supa∈A ρ(a, B), supb∈B ρ(b, A)}
between the sets A and B
B(S) Borelian σ−algebra on the topological space S

f (x)|x=y f evaluated in y
f + , f+ positive part of f , i.e., max{f, 0}
f − , f− negative part of f , i.e., max{−f, 0}
supp f support {x : f (x) > 0} of the function f
arg max f solution of a maximization problem (in some contexts
set-valued, i.e. {x : f (x) = supy f (y)}, in others a
measurably selected solution x with f (x) = supy f (y))

P probability measure
PX distribution of X
PY |X=x conditional distribution of Y given X = x
fX (·) a density of PX w.r.t. the Lebesgue measure
fY |X (·|x) a density of PY |X=x w.r.t. the Lebesgue measure
Q1  Q2 Q1 is absolutely continuous w.r.t. Q2
D(Q1 ||Q2 ) Kullback-Leibler distance of Q2 and Q1
a.s. P-almost surely, with probability one
P-a.a. P-almost all

E mathematical expectation
E[Y |X] conditional expectation of Y given X
E[Y |X = x] conditional expectation of Y given X = x
Var Variance
Cov Covariance
N (µ; Σ) normal distribution with mean µ and variance-
covariance matrix Σ

L1 (P) space of Lebesgue integrable functions w.r.t. P


Lq (P) qth order Lebesgue integrable functions w.r.t. P
5

const. a suitable constant


GSM geometrically strongly mixing
hot. higher order terms of an expansion
i.i.d. independent, identically distributed
p.a. per annum
w.r.t. with respect to
2 end of proof

All non-standard notation is explained when it occurs for the first time. The
random variables in this thesis are understood to be defined on a common
probability space (Ω, A, P). IRd -valued random variables are implicitly assumed
to be measurable w.r.t. the Borelian σ-algebra B(IRd ). If not stated otherwise,
0
measurability of functions f : IRd → IRd means measurability w.r.t. B(IRd ) and
0
B(IRd ).
6
7

SUMMARY

In this thesis we aim to plead for the application of nonparametric statistical


forecasting and regression estimation methods to financial investment problems.
In six chapters we explore applications of nonparametric techniques to portfolio
selection for financial investment. Clearly, this cannot be more than a crude
and somewhat arbitrary selection of topics within this vast area, so we decided
to concentrate on some typical situations. Our hope is to be able to illustrate
the benefits of nonparametric estimation methods in portfolio selection.

Chapter 1
Introduction: investment and nonparametric statistics
Investment is the strategic allocation of resources, typically of monetary re-
sources, in an environment, typically a market of assets, whose future evolution
is uncertain. Investment problems arise in a huge variety of contexts beyond the
financial one. Resources may also take the form of energy, of data-processing
resources, etc. Strategic investment planning helps to run many processes with
higher benefit. In this thesis we focus our attention on financial investment,
which we think is the “prototypical” example of a resource allocation process.
The three ingredients of financial investment are the market, the actions the
investor may take and his investment goal (discussed in detail in Sections 1.1-
1.3):

– As to the market: We assume that there are m assets in our financial market.
The jth asset yields a return Xi,n on an investment of 1 unit of money
during market period n (lasting from “time” n − 1 to n, time being mea-
sured, e.g., in days of trading). The ensemble of returns on the nth day
of trading is given by

Xn = (X1,n , ..., Xm,n)T ∈ IRm


+.
8

To the investor, the return process {Xn }∞ n=1 appears to be a stochastic


process which, in many real markets, is stationary and ergodic (Definition
1.1.1). In some chapters we impose additional (but realistic) conditions
on the distribution of the process. The key point is, however, that

we use nonparametric models, i.e. models that do not assume a para-


metric evolution equations such as ARMA, ARCH and GARCH equa-
tion to hold.

These models guarantee highest flexibility in real applications.

– As to the investment actions: We are concerned with an investor who neither


consumes nor deposits new money into his portfolio. At the beginning of
each market period n, our investor uses all his current wealth to acquire
a portfolio bn of the stocks. It will be convenient to describe the portfolio
bn by the proportion bj,n of the investor’s current wealth invested in asset
j (j = 1, ..., m) during market period n. Thus, bn is chosen at time n − 1
from the set S of all portfolios, consisting of the vectors (portfolios)

bn = (b1,n, ..., bm,n)T

satisfying bj,n ≥ 0 and m


P
j=1 bj,n = 1. In some situations the set of in-
vestment actions S may be further narrowed down by the occurence of
transaction costs.

– As to the investment goal: If W0 is his initial wealth, an investor using


n−1
the portfolio strategy {bi}i=0 manages to accumulate the wealth Wn =
Qn
i=1 < b i , X i > W0 during n market periods (< ·, · > is the Euclidean
scalar product). Naturally, the investor aims to maximize Wn . It is known
from literature that there is no essential conflict between short run (n
finite) and long term (n → ∞) investment. In both cases investment
according to the conditional log-optimal portfolio

b∗n := arg max E [log < b, Xn >| Xn−1 , ...X1]


b∈S

at time n is optimal, outperforming any other strategy because of


Wn 1 Wn
E ≤ 1 and lim sup log ∗ ≤ 0 with probability 1
Wn∗ n→∞ n Wn
9

(Cover and Thomas, 1991, Theorem 15.5.2). Here, Wn∗ is the wealth at
time n resulting from a series of conditionally log-optimal investments,
Wn the wealth from any other non-anticipating portfolio strategy. We
argue that

this is sufficient reason for the investor to use a logarithmic utility


function, i.e. to maximize the expected future logarithmic return given
the past return vectors.

The conditional log-optimal portfolio depends upon the distribution of the re-
turn process {Xn }n. Realistically, the true distribution of the market returns
and hence the log-optimal strategy is not known to the investor. This makes
statistics the natural partner of investment. Statistics is needed to solve the
key problem,
to find a non-anticipating portfolio selection scheme {b̂n }n (working with his-
torical return data only, without knowing the true return distribution) such
that for any stationary ergodic return process {Xn }n, the investor’s wealth
Ŵn := ni=1 < b̂i , Xi > grows –on the average– as fast as with the log-optimum
Q

strategy {b∗n}n . More formally, {b̂n }n should give


1 W∗
lim sup log n ≤ 0
n→∞ n Ŵn
with probability 1.
Such portfolio selection schemes are known to exist (Algoet, 1992). The dis-
advantage is that they are fairly complicated and, even worse, they require
an enormous amount of past return data to yield practically relevant results.
It is the aim of this thesis to provide simplified, yet efficient portfolio selec-
tion algorithms based on nonparametric forecasting and estimation techniques.
Particular emphasis is put on making the algorithms applicable in considerably
large classes of markets.

Chapter 2
Portfolio benchmarking: rates and dimensionality
The performance of a portfolio selection rule is usually compared with that
of a benchmark portfolio selection rule. Our benchmark is the log-optimal
portfolio selection rule, and as we have seen in Chapter 1, this is the optimal
rule. An investor will typically find his own rule underperforming. He can only
10

hope that underperformance vanishes sufficiently fast when – with increasing


number of market periods – his estimates for the distribution of the return
process and hence his idea of the market become more and more complete.
Now, if the investor evaluates the historical returns X1, ..., Xn leading to the
portfolio choice b̂n+1 at time n, he will achieve a return R̂n =< b̂n+1 , Xn+1 > on
his investment during the next market period. This should be compared with
the return Rn∗ =< b∗n+1 , Xn+1 > of the conditional log-optimal portfolio.

From our log-utility point of view we suggest to measure underperfor-



mance of b̂n+1 in terms of the positivity of E log R

n
. The smaller this
n

expectation becomes, the better is the selection rule b̂n+1 .

Assuming that the return data arises from a process of independent and iden-
tically distributed (i.i.d.) random variables, it is important to know at what
R∗n
rate the underperformance E log R̂ vanishes for typical portfolio selection rules.
n
Using notions from information theory we prove a lower bound on this rate in
Section 2.1. Even in the simplest of all markets, a market with only finitely
many possible return outcomes,

no empirical portfolio selection rule can make underperformance van-


ish in every market faster than n1 tends to 0, i.e. there is always a

market for which the inequality E log R n
R̂n
≥ const. · n1 holds (Theorem
2.1.1).

There are empirical portfolio selection rules that achieve this rate. In particular,
the empirical log-optimal portfolio
n
1X
b̂n+1 := arg max log < b, Xi > (0.0.1)
b∈S n
i=1

proves to be rate optimal in as far as

the empirical log-optimal portfolio selection rule (0.0.1) attains the


lower bound for the rate at which underperformance vanishes, whatever
the number of stocks in the market (Theorem 2.1.3).

Loosely speaking, it compensates for wrong investment decisions as fast as pos-


sible. Interesting enough, the findings are largely unaffected by the number
11

of stocks in the market, which is a rather untypical feature in nonparamet-


ric estimation (Theorem 2.1.4 shows that this phenomenon perseveres in more
complicated market settings).
This is why we discuss the effects of “dimensionality” on the portfolio selection
process in more detail in Section 2.2. We argue that a reduction of the whole
stock market to some pre-selected stocks is inevitable, e.g., because of compu-
tational restrictions. In other words, the investor can only handle a smallish
subset of all stocks in the market for investment strategy planning. These stocks
have to be selected in the planning phase, even before investment starts. Hence,
criteria for the pre-selection of stocks from the market are needed. A common
way to do this is to pick the stocks whose chart promises high growth rates. It
will turn out, however, that this is fallacious:

any selection algorithm that assesses the single stocks seperately, e.g.
on the basis of single stock expected returns, is sure to pick the “bad”
stocks in some realistic market (Theorem 2.2.1).

This is a somewhat negative result, but it warns us that reasonable selection


schemes have to include further information about the market. We will show
that the variance-covariance structure of the stock returns provides sufficient
information in many markets (more precisely, in markets with log-normal re-
turns). Section 2.3 illustrates the results with simulations and examples, demon-
strating their practical relevance.

Chapter 3
Predicted stock returns and portfolio selection
Having gained the insight that variance-covariance information about the mar-
ket (inter-stock correlations as well as temporal correlations) are integral to
successful investment decisions, we move on to particular investment strate-
gies. In Section 3.1 we consider a strategy which is particularly popular among
investors.
The strategy works in two steps, with the past logarithmic returns Yn, Yn−1 , ..., Y0
(Yi := log Xi ) as input data for the investment decision at time n:

1. Produce forecasts of the market future. It is established that forecasts


should be based on conditional expectations of future log-returns given
12

the observed past, i.e. on

Ŷn+1 := E[Yn+1|Yn , Yn−1, ...].

2. Invest in those stocks whose forecast Ŷn+1 promises to beat a riskless


investment in a bond with return rate r, i.e. invest in a stock iff

exp(Ŷn+1) ≥ r.

We will call this strategy a “greedy strategy”, because it tries to single out
the best possible stocks only. As we shall see, this provides us with a natural
strategy which can be applied in markets with low log-return variance (Section
3.1).
The major problem in implementing the greedy strategy is the fact that the
forecasts Ŷn+1 can only be calculated if the distribution of the return process
is known to the investor. Hence, we need to derive an estimate Ê(Yn, ..., Y0)
for the conditional expectation Ŷn+1 = E[Yn+1 |Yn, Yn−1 , ...] from the market
observations Yn , ..., Y0. It is known from literature that no such forecaster can
be strongly consistent in the sense of


lim Ê(Yn, ..., Y0) − E[Yn+1 |Yn, Yn−1, ...] = 0
(0.0.2)
n→∞

with probability 1 for any stationary and ergodic process {Yn}n (Bailey, 1976).
This result is discouraging, but it does not rule out the existence of strongly
consistent forecasting rules for log-return processes as they arise in real financial
markets. In particular, Gaussian log-return processes have been proven to be
a good approximation for real log-return processes, but so far no answer has
been found to the question whether there exist forecasters that are strongly
consistent in any stationary and ergodic Gaussian process. In Section 3.2 we
prove that the answer is indeed affirmative. Under weak extra conditions on
the Wold coefficients of the process

we present a forecaster Ê(Yn, ..., Y0) for stationary and ergodic Gaus-
sian processes which satisfies the strong consistency relation (0.0.2)
and which is remarkably easy to compute (Lemma 3.2.1 and Corollary
3.2.3).
13

This results provides us with the necessary tools to implement the greedy strat-
egy in Gaussian log-return processes. However, the algorithm is of interest very
much in its own right, forecasting problems for Gaussian processes arising in
many areas.
Section 3.3 proves the convergence properties of the algorithm. Application
examples with simulated and real data in Section 3.4 are promising –when the
algorithm is run as a mere forecasting algorithm as well as when the algorithm
is run as a subroutine for the greedy strategy.
Chapter 4
A Markov model with transaction costs: probabilistic view
In simple markets where returns arise as i.i.d. data, the investor should invest
in a constant log-optimal portfolio strategy. This requires him not to change
the proportion of wealth held in each stock during the investment process. The
proportions remain constant, however, the prices of the assets change relatively
to each other during each market period, so that the actual quantities of the
single stocks in the portfolio vary from market period to market period. Thus,
a large number of transactions are needed to follow a constant log-optimal
strategy. In practice, this is a huge drawback: Much of the wealth accumulated
by a log-optimal strategy has to be spent to settle transaction costs such as
brokerage fees, administrative and telecommunication expenses. The conclusion
for the investor must be to adapt his strategy to meet two requirements: to
make as few costly transactions as possible, but to make as many as necessary
to boost his wealth. The aim of Chapters 4 and 5 is to investigate how these
two conflicting requirements can be balanced in one strategy.
To this end we shall assume that the returns arise from a d-stage Markov pro-
cess. In Chapter 4 the distribution of the return process is known, an unrealistic
assumption which we will drop in Chapter 5. Section 4.1 generalizes the mar-
ket model from Chapter 1 to include transaction costs proportional to the total
value of the purchased shares. Not surprisingly, the investor can only afford
a limited range of portfolio choices in presence of transaction costs, and as we
shall see,
in d-stage Markovian return processes it suffices to consider strategies
based on portfolio selection functions, i.e. portfolio selection schemes
of the form bi = c(bi−1 , Xi−d , ..., Xi−1) with an appropriate function c
(Definition 4.1.2).
14

Hence, the next portfolio is a function of the last portfolio and the last d ob-
served return vectors. The investor aims to maximize his expected mean loga-
rithmic return as before by choosing an optimal selection function c.
In Section 4.2 we tackle the problem how to obtain an optimal selection function
c – if the distribution of the return process were known. The main result
demonstrates that

an optimal portfolio selection function c can be obtained from a solu-


tion of the Bellman equation (Theorem 4.2.1, equation 4.2.2).

The Bellman equation is known from the theory of dynamic programming,


but fundamental differences between classical dynamical programming and the
portfolio selection problem will become evident. Further properties of solutions
of the Bellman equation will be derived in Section 4.3, results that will be
needed for the arguments in Chapter 5.

Chapter 5
A Markov model with transaction costs: statistical view
The Bellman equation considered in Chapter 4 heavily depends upon the distri-
bution of the return process {Xn }n through a peculiar conditional expectation.
Hence, the results of Chapter 4 are valid only under the assumption that the in-
vestor knows the distribution of the stock return process. Of course, in practice
this is illusory. At best, the investor has an estimate of the return distribution
at his disposal. This, in turn, allows him to produce an estimate of the con-
ditional expectation in question and hence gives him an approximate Bellman
equation involving the observed empirical return data. Using nonparametric
regression estimation techniques

we will show in Section 5.1 how a natural empirical counterpart of the


Bellman equation from Chapter 4 can be found (equation 5.1.2).

With similar techniques as in Chapter 4 we will establish that this empirical


equation can be solved under realistic conditions.

This will lead us to a strategy that merely relies on observational data


but has the same optimality properties as the (theoretical) optimal port-
folio selection rule in presence of transaction costs (Theorems 5.1.1
and 5.1.2).
15

For this, we will fall back on generalizations of existing uniform consistency


results in regression estimation, which will be provided in Section 5.2. In par-
ticular, if {Xn }n is a stationary geometrically strongly mixing process and g is
taken from a class G of Lipschitz continuous functions we estimate the condi-
tional expectation

R(g, b, x) := E[g(X1, b)|X0 = x] (b ∈ S)

by a kernel regression estimator Rn (g, b, x). Depending on the smoothness of a


density of X0 (which we assume to exist) we determine the rate of convergence
of
sup E sup |Rn (g, b, x) − R(g, b, x)| → 0 (n → ∞),
g∈G x∈X ,b∈S

i.e. of the expected uniform estimation error, uniformly in G (Corollary 5.2.2).


This result is of interest in other areas of nonparametric statistics as well.
Finally, Section 5.3 is devoted to the proof of optimality and combines the
results from Chapter 4 with uniform consistent regression estimation techniques.

Chapter 6
Portfolio selection functions in stationary return processes
Considering the fact that the investor may have reason to believe that the his-
torical return data does not follow a d-stage Markov process in some cases,
we should move on to even more general market models than in the previous
chapters. Ignoring transaction costs, we consider a market whose returns are
merely stationary and ergodic. It is natural for the investor to take his invest-
ment decisions on the basis of recently observed returns, say on the basis of the
returns during the last d ∈ IN market periods (d fixed). This leads us to the
notion of log-optimal portfolio selection functions.
We make this more concrete in Section 6.1, where we take our familiar log-
utility approach again. The investor tries to find a log-optimal portfolio selection
function, i.e. a measurable function

b∗ : IRdm
+ −→ S

such that (< ·, · > denoting the Euclidean scalar product)

E (log < b∗ (X0 , ..., Xd−1), Xd >) ≥ E (log < f (X0 , ..., Xd−1), Xd >)
16

for all measurable f : IRdm ∗


+ −→ S. For the (n + 1)st day of trading, b advises

the investor to acquire the portfolio b (Xn−d+1 , ..., Xn).
Clearly, the concept of log-optimal portfolio selection functions does not reach
the same degree of generality as the concept of a conditional log-optimal port-
folio (where d is such that the whole observed past is included in the portfolio
decision). In spite of being a simplification, this approach nevertheless gives us
several advantages over the log-optimal strategy as far as computation, estima-
tion and interpretation are concerned.
With log-optimal portfolio selection functions we face the same problem as with
log-optimal portfolios. Both can only be calculated if the true distribution of the
return process happens to be known. A practitioner, however, needs to have an
estimation procedure that evaluates observed past return data to approximate
the true log-optimal device.

In Section 6.2 we therefore develop an algorithm to produce estimates


b̂n of a log-optimal portfolio selection function b∗ from past return data.

We require very mild conditions beyond stationarity and ergodicity. More pre-
cisely, we assume that the return process {Xn}∞ m
n=0 is an [a, b] -valued station-
ary and ergodic process (0 < a ≤ b < ∞ need not be known) and that a
Lipschitz condition on the conditional return ratio E[Xd / < s, Xd > |Xd−1 =
xd−1 , ..., X0 = x0] holds. The Lipschitz constant L is taken as a known market
constant.
Using a stochastic gradient algorithm and combining it with nonparametric
regression estimators,

we establish the strong convergence of the estimates b̂n to the true


log-optimal portfolio selection function b∗ , avoiding the usual mixing
conditions (Theorem 6.2.2).

What is even more important in practical applications:

Selecting portfolios on the basis of the estimated log-optimal portfolio


selection functions yields optimal growth of wealth among all other
strategies that take their investment decisions on the basis of the last
d observations.
17

Indeed, let Ŝn be the wealth accumulated during n market periods when on
the (i + 1)st day of trading the portfolio b̂i (Xi−d+1 , ..., Xi) is selected using the
most recent estimate b̂i of a log-optimal portfolio selection function. Then, if
Sn is the wealth accumulated during the same period using any other portfolio
selection function of the last d observed return vectors,
1 Sn
lim sup log ≤0
n→∞ n Ŝn
with probability 1 (Corollary 6.2.3).
After an appropriate modification, the algorithms and the results remain valid
even if the market constant L is unknown in real market applications (Theorem
6.2.4). Section 6.3 proves the findings, and the chapter is rounded off with
several realistic examples in Section 6.4.
Chapters 2, 3 and 6 can be read independently from each other, they are self-
contained. Chapters 4 and 5 are closely linked, however. Notation that goes
beyond common mathematical style is explained where it occurs for the first
time. We also refer the reader to the list of abbreviations at the beginning
of the thesis. The calculations and plots for the examples were generated us-
ing Matlab 4.0 and 6.0.0.88, Minitab 11.2 and R 1.1.1 with historical stock
quotes (daily closing prices) from the New York Stock Exchange provided by
www.wallstreetcity.com.
18
19

ZUSAMMENFASSUNG

Diese Arbeit soll ein Plädoyer sein für die Anwendung nichtparametrischer
statistischer Vorhersage- und Schätzmethoden auf Probleme, wie sie bei der
Planung von Finanzanlagen und Investitionen auftreten.
In sechs Kapiteln werden verschiedene Anwendungsmöglichkeiten nichtparamet-
rischer Techniken bei der Portfolioauswahl an Finanzmärkten analysiert. Dies
kann natürlich nur einen groben und zugegebenermaßen willkürlichen Aus-
schnitt aus diesem weiten Gebiet widerspiegeln –wir hoffen jedoch, dadurch die
Vorzüge nichtparametrischer Schätzmethoden bei der Portfolioauswahl aufzeigen
zu können.

Kapitel 1
Einführung: Investment und nichtparametrische Statistik
Investment ist der strategisch geplante Einsatz von Ressourcen (üblicherweise
von finanziellen Ressourcen) in einer Umgebung (üblicherweise in einem Fi-
nanzmarkt), deren zukünftige Entwicklung zufälligen Fluktuationen unterliegt.
Investitionsprobleme treten in einer Vielzahl von Gebieten auch über den fi-
nanziellen Kontext hinaus auf. Dabei können Ressourcen u. A. die Form
von Energie, von Datenverarbeitungskapazitäten, etc. annehmen. Die strate-
gische Planung von Investitionen hilft, viele Prozesse mit höherem Nutzen zu
betreiben. Diese Arbeit konzentriert sich auf finanzielle Investitionen, welche
gleichsam den “Prototyp” für verschiedenste Prozesse bilden, bei denen System-
ressourcen gewinnbringend einzusetzen sind.
Bei Investitionen finanzieller Natur spielen drei Komponenten eine Rolle: der
Markt, die Handlungsmöglichkeiten des Investors und sein Investitionsziel. Diese
Bausteine werden in den Abschnitten 1.1-1.3 im Detail diskutiert.

– Zum Markt: Wir gehen von einem Finanzmarkt mit m Anlagemöglichkeiten


(Aktien, festverzinsliche Wertpapiere, ...) aus. Die i. Anlagemöglichkeit
erzielt in der Marktperiode n eine Rendite Xi,n auf eine Investition von
20

einer Geldeinheit. Die n. Marktperiode dauere vom “Zeitpunkt” n − 1 bis


zum Zeitpunkt n, wobei die Zeit z.B. in Handelstagen gemessen wird. Die
Renditen der einzelnen Anlagemöglichkeiten am n. Handelstag werden im
Renditevektor
Xn = (X1,n , ..., Xm,n)T ∈ IRm
+

zusammengefasst. In den Augen des Investors ist {Xn }∞ n=1 ein stochasti-
scher Prozess, welcher in vielen realen Märkten stationär und ergodisch
ist (Definition 1.1.1). In manchen Kapiteln dieser Arbeit werden (re-
alistische) Zusatzannahmen über die Verteilung des Prozesses getroffen.
Entscheidend ist dabei jedoch,

dass wir nichtparametrische Modelle betrachten –Modelle also, die


nicht von der Existenz einer parametrischen Entwicklungsgleichung
ausgehen, wie sie z.B. ARMA-, ARCH- und GARCH-Prozesse be-
sitzen.

Diese Modelle garantieren höchste Flexibilität bei der Anwendung in realen


Finanzmärkten.

– Zu den Handlungsmöglichkeiten: Wir betrachten einen Investor, der weder


Teile seines Vermögens auf persönlichen Konsum verwendet, noch seinem
Portfolio im Verlauf des Investitionsprozesses neues Geld zufließen lässt.
Am Beginn jeder Marktperiode n verwendet der Investor sein gesamtes
Vermögen darauf, ein Aktienportfolio bn zu erwerben. Ein solches Portfolio
bn wird durch die Anteile bj,n am aktuellen Gesamtvermögen des Investors
beschrieben, welche in der n. Marktperiode in die Anlegemöglichkeit j =
1, ..., m investiert werden. Die Wahl von bn erfolgt dann aus der Menge S
aller Portfolios, welche aus den Vektoren (Portfolios)

bn = (b1,n, ..., bm,n)T

besteht, für die bj,n ≥ 0 und m


P
j=1 bj,n = 1. In manchen Situationen wird
S weiter durch das Auftreten von Transaktionskosten eingeschränkt.

– Zum Investitionsziel: W0 sei das anfängliche Vermögen des Investors. Ver-


wendet er die Portfoliostrategie {bi }n−1
i=0 , wird er nach n Marktperioden
Qn
über das Vermögen Wn = i=1 < bi, Xi > W0 verfügen (< ·, · > be-
zeichnet das Euklidische Skalarprodukt). Ziel des Investors ist es, für Wn
21

einen möglichst großen Wert zu erzielen. Aus der Literatur ist bekannt,
dass dabei kein grundlegender Konflikt zwischen nahen (n endlich) und
fernen (n → ∞) Investitionshorizonten besteht. In beiden Fällen ist eine
Investition zum Zeitpunkt n gemäß dem bedingt log-optimalen Portfolio

b∗n := arg max E [log < b, Xn >| Xn−1 , ...X1]


b∈S

optimal. Es übertrifft jede andere Strategie indem


Wn 1 Wn
E ≤ 1 und lim sup log ∗ ≤ 0 mit Wahrscheinlichkeit 1
Wn∗ n→∞ n Wn

(Cover and Thomas, 1991, Theorem 15.5.2). Wn∗ ist dabei das Vermö-
gen zum Zeitpunkt n, das der Investor durch eine Serie von bedingt log-
optimalen Investitionen erzielt, Wn das Vermögen mit einer beliebigen an-
deren Portfoliostrategie, die nicht über mehr Information verfügt als aus
vergangenen Marktbeobachtungen ableitbar (eine sogenannte “kausale”
Strategie).

Dies sollte für den Investor Grund genug sein, eine logarithmische
Nutzenfunktion zu verwenden, d.h. mit dem Wissen um die in der
Vergangenheit beobachteten Renditevektoren die Maximierung der er-
warteten zukünftigen logarithmierten Rendite zu betreiben.

Das bedingt log-optimale Portfolio leitet sich aus der Verteilung des Rendite-
prozesses {Xn }n ab. In der Realität ist die wahre Verteilung der Renditen und
damit auch die bedingt log-optimale Strategie dem Investor nicht bekannt. An
diesem Punkt bedarf die Finanzplanung der Statistik als Partner. Die Statistik
dient dem Investor zur Lösung des Problems,
eine Methode zu finden, die nur anhand historischer Renditedaten und ohne
Kenntnis der wahren Renditeverteilung eine optimale kausale Portfoliostrategie
{b̂n }n erzeugt. Optimalität wird hier in dem Sinn verwendet, dass die Strate-
gie für jeden stationären und ergodischen Renditeprozess {Xn}n das Vermö-
gen Ŵn := ni=1 < b̂i, Xi > des Investors im Mittel genauso schnell wach-
Q

sen lässt wie die log-optimalen Strategie {b∗n }n. Formal ausgedrückt soll {b̂n }n
garantieren, dass mit Wahrscheinlichkeit 1
1 W∗
lim sup log n ≤ 0.
n→∞ n Ŵn
22

Es ist bekannt, dass solche Methoden existieren (Algoet, 1992). Diese brin-
gen jedoch den Nachteil mit sich, höchst komplex zu sein und zur Erzeugung
praktisch verwertbarer Ergebnisse eine Unmenge historischer Daten zu benöti-
gen. Ein Ziel dieser Arbeit ist es, vereinfachte, aber effiziente Algorithmen zur
Portfolioauswahl zu entwickeln, die auf nichtparametrischen Vorhersage- und
Schätzverfahren basieren. Die Algorithmen sollen so gestaltet sein, dass sie für
möglichst große Klassen von Märkten anwendbar sind.

Kapitel 2
Der Vergleich von Portfolios: Konvergenzraten und Dimension
Die Güte einer Methode zur Portfolioauswahl wird in der Regel durch den
Vergleich mit einer Referenzstrategie beurteilt. Unsere Referenzstrategie ist
die log-optimale Portfolioauswahl, die –wie wir in Kapitel 1 gesehen haben–
eine optimale Verhaltensregel darstellt. Dem Investor wird es nicht gelin-
gen, letztere zu übertreffen. Natürlich wird er hoffen, dass der Mangel an
Leistungsfähigkeit seiner eigenen Strategie im Verlauf des Investitionsprozesses
verschwindet, wenn nämlich seine Schätzungen für die Verteilung des Ren-
diteprozesses mit zunehmender Menge verfügbarer historischer Daten immer
besser werden. Wählt der Investor zum Zeitpunkt n anhand der Beobachtun-
gen X1 , ..., Xn sein Portfolio, wird er in der nächsten Marktperiode eine Rendite
von R̂n =< b̂n+1, Xn+1 > erwirtschaften, während die log-optimale Strategie
Rn∗ =< b∗n+1 , Xn+1 > liefert. Der Vergleich beider Werte ermöglicht die Ein-
schätzung, um wieviel b̂n+1 der log-optimalen Strategie b∗n+1 unterlegen ist.

Vom Standpunkt einer logarithmischen Nutzenfunktion ist es daher


angebracht, die Unterlegenheit der Strategie b̂n+1 an der Positivität

der erwarteten Differenz der log-Renditen, an E log R

n
zu messen. Je
n

kleiner dieser Wert, desto besser ist die Strategie b̂n+1 .

Zur Beurteilung der Qualität der Strategie b̂n+1 ist also insbesondere zu analysie-
R∗n
ren, mit welcher Geschwindigkeit E log R̂ gegen Null strebt. Dabei wird davon
n
ausgegangen, dass die Renditen in einem Prozess von unabhängigen, identisch
verteilten Zufallsvariablen auftreten. Unter Verwendung von Konzepten der
Informationstheorie wird in Abschnitt 2.1 eine untere Schranke für diese Kon-
vergenzgeschwindigkeit abgeleitet. Diese besagt, dass selbst im einfachsten aller
Märkte, einem Markt mit nur endlich vielen möglichen Renditekonstellationen
gilt:
23

Es gibt keine Portfolioauswahlregel, die ihre Unterlegenheit im Ver-


gleich zur log-optimalen Strategie in jedem Markt schneller kompen-
siert als n1 gegen Null strebt, d.h. es gibt stets einen Markt, für den

E log R

n
≥ const. · n1 (Theorem 2.1.1).
n

Es gibt jedoch Portfolioauswahlregeln, die diese Rate erreichen. Insbesondere


das empirisch log-optimale Portfolio
n
1X
b̂n+1 := arg max log < b, Xi > (0.0.3)
b∈S n
i=1

erweist sich hier als günstig:

Das empirische log-optimale Portfolio (0.0.3) erreicht die untere



Schranke für die Konvergenzrate von E log R

n
(Theorem 2.1.3).
n

Etwas leger ausgedrückt könnte man sagen, dass das empirisch log-optimale
Portfolio seine Defizite mit optimaler Geschwindigkeit wettzumachen vermag.
Die Ergebnisse gelten weitestgehend unabhängig von der Anzahl der Aktien
am betrachteten Markt. Dies ist untypisch für nichtparametrische Schätzver-
fahren und bedarf daher genauerer Diskussion (Theorem 2.1.4 zeigt, dass dieses
Phänomen auch in komplizierter gearteten Märkten auftritt).
Aus diesem Grund schließen wir in Abschnitt 2.2 eine detailliertere Diskus-
sion der Auswirkungen der Dimension des Marktes auf die Portfolioauswahl an.
Beschränkte rechnerische Kapazitäten werden den Investor bei seiner Investi-
tionsplanung dazu zwingen, sich auf eine kleinere Teilmenge aller Aktien am
Markt zu beschränken. Diese Teilmenge muss bereits in der Planungsphase,
also vor dem eigentlichen Investitionsprozess ausgewählt werden. Es werden
Kriterien für diese Vorauswahl benötigt. Üblicherweise würde man vorgehen,
indem man einzelne Aktien auswählt, deren Chart hohe Wachstumspotentiale
versprechen. Es wird gezeigt werden, dass dieser Weg mit substantiellen Un-
zulänglichkeiten behaftet ist:

Jedes Auswahlverfahren, das die einzelnen Aktien getrennt, z.B. an-


hand ihrer erwarteten logarithmierten Rendite, beurteilt, wird mit
Sicherheit in einem realistischen Markt die falsche Auswahl treffen
(Theorem 2.2.1).
24

Dieses negative Resultat zeigt, dass Portfolioauswahlverfahren über die einzel-


nen erwarteten log-Renditen hinausgehende Information benötigen. Die Vari-
anz-Kovarianz-Struktur der Renditen wird in Märkten mit log-normal verteilten
Renditen hinreichend viel Information vermitteln. In Abschnitt 2.3 werden die
Resultate anhand von Simulationen und realen Beispielen illustriert und ihre
praktische Relevanz aufgezeigt.

Kapitel 3
Renditevorhersagen und Portfolioauswahl
Mit der Erkenntnis, dass erfolgreiche Portfolioauswahl Information über die
Varianz-Kovarianz-Struktur der Aktien am Markt bedarf (es spielen sowohl
zeitliche Korrelationen als auch Korrelationen zwischen den einzelnen Aktien
eine Rolle), wird in Abschnitt 3.1 eine Investmentstrategie vorgestellt, die sich
unter den Investoren großer Beliebtheit erfreut.
Die Strategie ist zweistufig und verwendet dabei die historischen log-Renditen
Yn ,Yn−1, ..., Y0 (Yi := log Xi ) als Eingangsdaten für die Investitionsentscheidung
zur Zeit n:

1. Erstelle eine Schätzung für die Zukunft des Marktes. Es wird gezeigt
werden, dass Vorhersagen für den Markt auf bedingten Erwartungen für
zukünftige log-Renditen bei gegebener Vergangenheit basieren sollten,
d.h. auf
Ŷn+1 := E[Yn+1|Yn , Yn−1, ...].

2. Investiere ausschließlich in die Aktien, deren Vorhersagen Ŷn+1 eine bessere


Rendite verheißen als ein festverzinsliches Wertpapier mit Rendite r. In
eine Aktie wird also investiert genau dann, wenn

exp(Ŷn+1) ≥ r.

Wir nennen diese Strategie eine Strategie für den “gierigen Investor”, da sie da-
rauf ausgerichtet ist, nur die bestmöglichen Anlagemöglichkeiten herauszupicken.
Die Einfachheit der Strategie besticht, und in Märkten mit geringer Varianz der
log-Renditen führt sie zu sinnvollen Ergebnissen (Abschnitt 3.1).
Bei der Implementierung der Strategie sieht sich der Investor der Schwierigkeit
gegenüber, dass die Vorhersagewerte Ŷn+1 nur unter Kenntnis der wahren Vertei-
lung des Prozesses berechnet werden können. Daher wird man sich auf die
25

Berechnung einer Schätzung Ê(Yn , ..., Y0) für den bedingten Erwartungswert
E[Yn+1|Yn , Yn−1, ...] aus den Marktbeobachtungen Yn, ..., Y0 beschränken müssen.
Aus der Literatur ist bekannt, dass keine auf solche Weise gewonnene Schätzung
stark konsistent sein kann in dem Sinne, dass


lim Ê(Yn, ..., Y0) − E[Yn+1 |Yn, Yn−1, ...] = 0 (0.0.4)
n→∞

mit Wahrscheinlichkeit 1 für jeden stationären und ergodischen Prozess {Yn}n


gilt (Bailey, 1976). Dieses Resultat ist einerseits entmutigend, andererseits
schließt es nicht aus, dass stark konsistente Vorhersagemechanismen für loga-
rithmierte Renditeprozesse existieren, wie sie in realen Finanzmärkten auftreten.
Dabei ist insbesondere an Gaußsche log-Renditeprozesse zu denken, die eine
gute Approximation für reale log-Renditeprozesse liefern. Bis jetzt jedoch war
die Frage unbeantwortet, ob für stationäre und ergodische Gaußsche Prozesse
stark konsistente Vorhersagealgorithmen existieren. Abschnitt 3.2 wird nun
eine positive Antwort darauf geben können. Unter schwachen Zusatzvorausset-
zungen an die Wold-Koeffizienten des Prozesses

wird ein Vorhersagealgorithmus Ê(Yn, ..., Y0) für stationäre und er-
godische Gaußsche Prozesse entwickelt, der stark konsistent gemäß
(0.0.4) ist und der bemerkenswert einfach zu implementieren ist
(Corollary 3.2.3).

Diese Ergebnisse geben uns die Subroutinen an die Hand, um die Strategie für
den “gierigen” Investor in Gaußschen log-Renditeprozessen umzusetzen. Der
Algorithmus selbst ist jedoch auch unabhängig von seiner hier gegebenen An-
wendung von Interesse, treten Vorhersageprobleme für Gaußsche Prozesse doch
in einer Vielzahl von Gebieten auf.
Der Beweis der Konvergenzeigenschaften wird in Abschnitt 3.3 geführt. Anwen-
dungsbeispiele mit realen und simulierten Daten schließen sich in Abschnitt 3.4
an und zeigen vielversprechende Ergebnisse, wenn der Algorithmus zur reinen
Vorhersage, aber auch als Subroutine für die “gierige” Strategie dient.

Kapitel 4
Ein Markov-Modell mit Transaktionskosten: stochastische Aspekte
In den einfachsten Märkten, in denen die Renditen als unabhängige, iden-
tisch verteilte Zufallsvariablen auftreten, sollte in ein zeitlich konstantes log-
26

optimales Portfolio investiert werden. Bei Verwendung eines zeitlich konstan-


ten Portfolios verwendet man auf jede Aktie einen gleichbleibenden Anteil des
aktuellen Gesamtvermögens. Der Anteil bleibt somit derselbe, bedingt durch
die Änderung der Aktienpreise zueinander ändert sich jedoch die tatsächliche
Anzahl an gehaltenen Aktien von Marktperiode zu Marktperiode. Zur Durch-
führung einer log-optimalen Strategie wird somit eine große Anzahl an Transak-
tionen notwendig. In der Realität stellt dies einen nicht zu unterschätzenden
Nachteil dar. Was immer an Vermögen anwächst, ein Großteil der Gewinne wird
zur Begleichung von Transaktionskosten wie Maklerprovisionen, Verwaltungs-
und Kommunikationskosten wieder abfließen. Folglich muss der Investor seine
Strategie diesen Gegebenheiten anpassen: Er muss so wenige kostenintensive
Transaktionen wie möglich machen, aber doch so viele, um ein gutes Wertwachs-
tum zu erzielen. Kapitel 4 und 5 widmen sich der Frage, wie diese beiden
Anforderungen in einer Strategie miteinander vereinbart werden können.
Zu diesem Zweck nehmen wir an, dass die Renditen sich gemäß einem d-stufigen
Markovschen Prozess entwickeln. In Kapitel 4 arbeiten wir unter der Prämisse,
dass die Verteilung des Renditeprozesses bekannt ist, eine unrealistische An-
nahme, die wir in Kapitel 5 fallen lassen werden. Zunächst wird in Abschnitt
4.1 das Marktmodell aus Kapitel 1 um Transaktionskosten erweitert, die pro-
portional zum Volumen gekaufter Aktien anfallen. Es ist nicht überraschend,
dass sich der Investor in einer solchen Situation nur eine eingeschränkte Menge
von Portfoliozusammenstellungen leisten kann, ohne bankrott zu gehen. Es
wird deutlich werden,

dass es in d-stufigen Markovschen Renditeprozessen ausreicht, Strate-


gien zu betrachten, die auf Portfolioauswahlfunktionen beruhen, d.h.
Strategien der Form bi = φ(bi−1 , Xi−d , ..., Xi−1) mit einer geeigneten
Funktion φ (Definition 4.1.2).

Das nächste zu wählende Portfolio ist somit eine Funktion des letzten gewählten
Portfolios und der letzten d am Markt beobachteten Renditevektoren. Wie
zuvor strebt der Investor danach, sein zu erwartendes logarithmiertes Vermö-
genswachstum zu maximieren, hier nun indem er eine optimale Portfolioaus-
wahlfunktion φ wählt.
Abschnitt 4.2 legt dar, wie eine optimale Auswahlfunktion c konstruiert werden
kann – alles unter der Prämisse, dass die wahre Verteilung der Renditen bekannt
wäre. Das Hauptresultat wird zeigen,
27

dass eine optimale Portfolioauswahlfunktion φ aus einer Lösung der


Bellman-Gleichung konstruiert werden kann (Theorem 4.2.1, Glei-
chung 4.2.2).

Die Bellman-Gleichung ist aus der Theorie der dynamischen Optimierung wohl-
bekannt, dennoch werden sich fundamentale Unterschiede zwischen klassischer
dynamischer Optimierung und dem Portfolioauswahl-Problem zeigen. Zur Vor-
bereitung auf Kapitel 5 werden in Abschnitt 4.3 schließlich weitere analytische
Eigenschaften der Lösung der Bellman-Gleichung abgeleitet.

Kapitel 5
Ein Markov-Modell mit Transaktionskosten: statistische Aspekte
Die Bellman-Gleichung, wie sie in Kapitel 4 aufgestellt wurde, hängt entschei-
dend von der Verteilung des Renditeprozesses {Xn }n ab. Diese Abhängigkeit
besteht in Form eines zu evaluierenden bedingten Erwartungswertes. Aus diesem
Grund sind die Ergebnisse von Kapitel 4 nur unter der Prämisse gültig, dass
der Investor die wahre Verteilung des Renditeprozesses kennt, was in der Praxis
natürlich illusorisch ist. Bestenfalls verfügt der Investor über eine Schätzung der
Verteilung der Renditen. Diese ermöglicht es ihm, eine Schätzung für bewussten
bedingten Erwartungswert zu berechnen, welche ihm dann eine Näherung der
Bellman-Gleichung liefert. Mit Hilfe von Techniken aus der nichtparametrischen
Regressionsschätzung

wird in Abschnitt 5.1 gezeigt, dass zur Bellman-Gleichung aus Kapi-


tel 4 eine natürliche empirische Entsprechung basierend auf Markt-
beobachtungen existiert (Gleichung 5.1.2).

Ähnliche Schlussweisen wie in Kapitel 4 werden es uns ermöglichen, diese em-


pirische Bellman-Gleichung unter realistischen Bedingungen zu lösen.

Das wird zu einer Strategie führen, die ausschließlich auf historischen


Renditen basiert, dabei jedoch dieselben Optimalitätseigenschaften wie
die (theoretisch) optimale Portfolioauswahlstrategie unter Transak-
tionskosten hat (Theoreme 5.1.1 und 5.1.2).

In den Betrachtungen von Kapitel 5 werden wir auf Verallgemeinerungen von


bekannten Resultaten über die gleichmäßige Konvergenz von Regressionsschät-
zern zurückgreifen. Diese Verallgemeinerungen werden in Abschnitt 5.2 her-
geleitet. Ist z.B. {Xn}n ein stationärer Prozess, welcher die geometrischen
28

Mischungseigenschaft hat, und ist g aus einer Klasse G lipschitzstetiger Funk-


tionen gewählt, schätzen wir den bedingten Erwartungswert

R(g, b, x) := E[g(X1, b)|X0 = x] (b ∈ S)

durch einen Kernschätzer Rn (g, b, x). In Abhängigkeit von der Glattheit einer
Dichte von X0 (wir nehmen an, dass eine solche existiert) wird die Konvergenz-
geschwindigkeit in der Limesrelation

sup E sup |Rn (g, b, x) − R(g, b, x)| → 0 (n → ∞)


g∈G x∈X ,b∈S

bestimmt. Dabei wird der zu erwartende maximale Schätzfehler gleichmäßig in


der Klasse G betrachtet (Corollary 5.2.2). Das erhaltene Resultat ist nicht nur
im Hinblick auf unsere Anwendung von Interesse, sondern auch darüber hinaus
als unabhängiges Resultat in der nichtparametrischen Regressionsschätzung.
Abschnitt 5.3 schließlich widmet sich dem Beweis der Optimalitätseigenschaften
des Algorithmus und kombiniert dabei die Ergebnisse aus Kapitel 4 mit den
Ergebnissen zur gleichmäßig konsistenten Regressionsschätzung.

Kapitel 6
Portfolioauswahlfunktionen in stationären Renditeprozessen
An realen Finanzmärkten beobachtet man unter Umständen eine Abweichung
des Renditeprozesses {Xn }n von einem d-stufigen Markov-Prozess. Deshalb
werden in diesem Kapitel noch allgemeinere Marktmodelle zu betrachten sein.
Transaktionskosten werden dabei ignoriert, dafür aber Renditeprozesse betrach-
tet, für die im Wesentlichen nur Stationarität und Ergodizität vorausgesetzt
wird. Für den Investor ist es naheliegend, seine Investitionsentscheidungen an-
hand der letzten d am Markt beobachteten Renditevektoren (d fest) zu treffen.
Dies führt zum Konzept von log-optimalen Portfolioauswahlfunktionen.
Dieses Konzept wird in Abschnitt 6.1 eingeführt. Der Investor verwendet wieder
eine logarithmische Nutzenfunktion und versucht daher, eine log-optimale Port-
folioauswahlfunktion zu ermitteln, d.h. eine messbare Funktion

b∗ : IRdm
+ −→ S,

so dass (< ·, · > bezeichnet das euklidische Skalarprodukt)

E (log < b∗ (X0 , ..., Xd−1), Xd >) ≥ E (log < f (X0 , ..., Xd−1), Xd >)
29

für alle messbaren Funktionen f : IRdm


+ −→ S. Für die (n + 1). Marktperiode
legt b dem Investor nahe, das Portfolio b∗(Xn−d+1 , ..., Xn) zu erwerben.

Das Konzept log-optimaler Portfolioauswahlfunktionen bleibt in seiner Allge-


meinheit hinter dem Konzept des bedingt log-optimalen Portfolios zurück (dieses
wählt den Parameter d so, dass die ganze Vergangenheit des Prozesses in die
Portfolioauswahl einbezogen wird). Obwohl es sich in diesem Sinn um eine
Vereinfachung handelt, vereinigen log-optimale Portfolioauswahlfunktionen im
Vergleich zum log-optimalen Portfolio einige Vorteile auf sich, insbesondere was
Berechnung, Schätzung und Interpretation angeht.
Bei log-optimalen Portfolioauswahlfunktionen sieht sich der Investor demselben
Problem gegenüber wie bei der Verwendung log-optimaler Portfolios. Beide
können nur berechnet werden, wenn die wahre Verteilung des Renditeprozesses
bekannt sein sollte. In der Praxis ist dies nicht der Fall, und man benötigt wieder
eine Schätzprozedur, die eine log-optimale Portfolioauswahlfunktion anhand in
der Vergangenheit beobachteten Renditedaten annähert.

In Abschnitt 6.2 wird deshalb ein Algorithmus entwickelt, der


Schätzungen b̂n für eine log-optimale Portfolioauswahlfunktion b∗ aus
historischen Renditedaten berechnet.

Über Stationarität und Ergodizität hinaus werden dabei sehr milde Zusatzvo-
raussetzungen getroffen, konkret wird davon ausgegangen, dass der Renditepro-
zess {Xn }∞ m
n=0 ein [a, b] -wertiger stationärer und ergodischer stochastischer Pro-
zess ist (0 < a ≤ b < ∞ brauchen nicht bekannt zu sein) und dass eine
Lipschitzbedingung für den bedingten Renditequotienten E[Xd / < s, Xd >
|Xd−1 = xd−1 , ..., X0 = x0 ] gilt. Die Lipschitzkonstante L sei dabei eine bekan-
nte Marktkonstante.
Mit Hilfe eines stochastischen Gradientenverfahrens und Methoden der nicht-
parametrischen Regressionsschätzung wird gezeigt,

dass die Schätzungen b̂n mit Wahrscheinlichkeit 1 gegen die wahre


log-optimale Portfolioauswahlfunktion b∗ konvergieren, wobei die in
der Literatur typischerweise vorausgesetzten Mixing-Bedingungen ver-
mieden werden (Theorem 6.2.2).

In der praktischen Anwendung spielt das folgende Resultat eine noch wichtigere
Rolle:
30

Eine Portfolioauswahl anhand der geschätzten log-optimalen Portfolio-


auswahlfunktionen liefert ein optimales Vermögenswachstum unter
allen Strategien, die ihre Investitionsentscheidungen anhand der letz-
ten d am Markt beobachteten Renditen treffen.

Sei Ŝn das Vermögen, das man nach n Marktperioden erzielt hat, wenn man
am (i + 1). Handelstag die aktuelle Schätzung b̂i verwendet, um das Portfo-
lio b̂i (Xi−d+1 , ..., Xi) zu wählen. Wenn Sn das Vermögen angibt, das man in
derselben Zeit mit einer beliebigen anderen Auswahlstrategie basierend auf den
jeweils letzten d beobachteten Renditen erwirtschaftet, so ist
1 Sn
lim sup log ≤0
n→∞ n Ŝn
mit Wahrscheinlichkeit 1 (Corollary 6.2.3).
Nach einer geeigneten Modifikation behalten die Algorithmen und die Resultate
ihre Gültigkeit, selbst wenn –wie in der Anwendungspraxis– die Marktkonstante
L dem Investor unbekannt ist (Theorem 6.2.4). Abschnitt 6.3 beweist die Resul-
tate, und das Kapitel wird mit mehreren realistischen Beispielen in Abschnitt
6.4 abgerundet.
Die Kapitel 2, 3 und 6 können unabhängig voneinander gelesen werden, sie sind
in sich abgeschlossen. Kapitel 4 und 5 sind jedoch eng verzahnt. Notationen,
die über die mathematische Standardnotation hinausgehen, werden bei ihrem
ersten Auftreten erklärt. Der Leser sei auch auf das Abkürzungsverzeichnis am
Anfang dieser Arbeit verwiesen. Berechnungen und Schaubilder für die Beispiele
wurden mit Matlab 4.0 und 6.0.0.88, Minitab 11.2 sowie R 1.1.1 erzeugt, wobei
die historischen Kursnotierungen (tägliche Schlusskurse) der New Yorker Börse
von www.wallstreetcity.com Verwendung fanden.
31

ACKNOWLEDGEMENTS

I am endebted to

Prof. Harro Walk, who suggested to investigate the subject. He advised me on


many points, always found the time to discuss the results, and more than
once I benefitted from his extensive knowledge.

Prof. Laszlo Györfi, for his hospitality during my visits to the Technical Uni-
versity of Budapest. On several occasions he gave me the right impulse
and really useful advice.

Prof. Volker Claus, for his interest in my work and for discussing the contents
of this thesis with me.

The DFG and the College of Graduates ”Parallel and Distributed Systems”, for
funding my research with everything that involves.

Dr. Michael Kohler, who introduced me to nonparametric curve estimation.


His expertise in this field was an invaluable source.

Dr. Jürgen Dippon, who never threw me out when I felt like discussing prob-
lems in mathematical statistics and finance.

Prof. Adam Krzyżak, for being my host during a stay at Concordia University,
Montréal, for many discussion about mathematical and other interesting
subjects.
32
33

CHAPTER 1

Introduction: investment and nonpara-


metric statistics
Investment is the strategic allocation of resources, typically of monetary re-
sources, in an environment, typically a market of assets, whose future evolution
is uncertain. This definition leaves much room for subjective interpretation. In
particular, the following points have to be made more precise:

– What market is under consideration? This involves specifying and stan-


dardizing the assets traded in the market (e.g. stocks, bonds, options,
futures, currencies, gold, oil, ...) as well as setting up a reference system
for pricing the assets (e.g. closing or opening prices at the New York
Stock Exchange, world market price for raw materials, ...).

– What actions and instruments may be applied by the investor? Possible


actions may be restricted by exogenous terms and regulations of trade
(e.g. transaction costs, brokerage fees, trading limitations) or personal
preferences (e.g. to rule out borrowing money or short positions in stocks).

– What investment goal is pursued by the investor? Traditionally, the goal


is the maximisation of a personal utility function of the returns on the
allocated resources. The market being chancy, individual risk aversion
preferences may enter the form of the utility function, or restrictions are
imposed on the set of possible investment actions.

Thus, “investment” becomes a highly subjective term, including investment as


it is understood in this thesis. In the following we set up the specific invest-
ment scenario as we shall consider it in this thesis. We believe this scenario
is broadly accepted as the typical setting for investment analysis, although we
34 Chapter 1. Introduction: investment and nonparametric statistics

do not deny that particular investment situations require further adaptation


and modification. It should also be pointed out that, as future asset prices are
subject to random fluctuations, “investment” is a good deal about decision tak-
ing under uncertainty, which makes mathematical statistics the natural partner
of investment (an observation that may be attributed to the groundbreaking
work of Bachelier, 1900, who used statistics to compare his theoretical model
with real market data). An economist will find the economic side of this thesis
to be lacking. There are excellent books on investment science from a more
economic point of view (e.g. Francis, 1980; Luenberger, 1998), but most of
them are lacking in statistical depth. This thesis is about investment from a
decisively statistical point of view – we can therefore only superficially touch
upon economic issues.

1.1 The market model


We consider a market in which m assets (which we will think of as stocks and
bonds) are traded. Taking a macroeconomic point of view, the prices of the
assets (stock quotes, bond values) are generated under the authority of the
market as a whole, i.e. by the large ensemble of investors. We assume that for
the individual investor there is no way to influence the prices by launching spe-
cific investment actions or distributing insider or side information of whatever
kind. In this situation, let P1,n , ..., Pm,n > 0 be the prices of the assets 1, ..., m
at the beginning of market period n (market period n lasts from “time” n − 1
to n, time being measured, e.g., in days of trading). To the “powerless” individ-
ual investor described above, the asset prices present themselves as a random
process on a common probability space (Ω, A, P).
The return of an investment of 1 unit of money in asset i at time n − 1 yields
a return
Pi,n
Xi,n :=
Pi,n−1
during the subsequent market period. We collect the returns of the single assets
in a return vector
Xn := (X1,n , ..., Xm,n)T .
We will often work with the log-returns

log Xn := (log X1,n , ..., log Xm,n )T .


1.1 The market model 35

The return process {Xn }∞ ∞


n=1 and the log-return process {log Xn }n=1 are
stochastic processes on (Ω, A, P).
In most of our investigations we will assume that the return process {Xn }n is
stationary and ergodic in the sense of the following definition (Stout, 1974, Sec.
3.5; Shiryayev, 1984, V §3):

m
Definition 1.1.1. Let {Xn }∞
n=1 be an IR -valued stochastic process on a prob-
ability space (Ω, A, P).

1. {Xn }∞
n=1 is called stationary, if

P(Xi ,...,Xj ) = P(Xi+t ,...,Xj+t )

for all integers i, j, t with i ≤ j.

2. A ∈ A is called an invariant event of {Xn}∞


n=1 , if there exists a B ∈
B((IRm )∞ ) such that

A = {Xi, Xi+1 , ...}−1(B)

for all i ∈ IN.

3. A stationary process {Xn }∞n=1 is called ergodic, if the probability of any


invariant event of {Xi}∞
i=1 is either 0 or 1.

Stationarity preserves the stochastic regime over time, ergodicity is the setting
where time averages along trajectories of the process converge almost surely to
expected values under the process distribution:

Theorem 1.1.2. (Birkhoff Ergodic Theorem, Stout, 1974, Sec. 3.5) Let
m
{Xn }∞n=1 be an IR -valued stationary and ergodic stochastic process on a prob-
ability space (Ω, A, P) with E|X1 | < ∞. Then
n
1X
Xi −→ EX1
n i=1

P-almost surely (P-a.s.), i.e. for all ω ∈ Ω from a set of probability 1.

Stationarity and ergodicity are the basic assumptions for most statistical inves-
tigations. The stationarity of stock returns is a thoroughly investigated field,
36 Chapter 1. Introduction: investment and nonparametric statistics

both by economists (e.g. Francis, 1980, A24-1) and statisticians (e.g. Franke et
al., 2001, Sec. 10.6). It is natural to assume that there is short term stationar-
ity in most stock returns, some authors (Francis, 1980) even claim that return
data may be treated as stationary if the time horizon comprises at least one
complete business cycle. There is no conclusive answer that proves or disproves
stationarity for the majority of stock markets, and it seems as though this has
to be decided from case to case. We accept stationarity as a working hypothesis,
accounting for the fact that it is common practice to assess and compare the
performance of statistical methods in the stationary setting.
Not much is known about the ergodic properties of stock quotes or stock re-
turns, neither from the theoretical economist’s point of view, nor from empirical
studies. There are indications that the ergodic properties of a market depend
very much upon the flow of information in the market and on the microeconomic
price generation (Donowitz and El-Gamal, 1997). These are difficult to assess,
and so the typical approach has become to derive algorithms under ergodic
hypotheses and then let the success of the algorithm justify the hypotheses.
Throughout this thesis we consider nonparametric models for {Xn }n, i.e.
models that do not require a parametrized evolution equation (in contrast to
MA, AR, ARMA, ARIMA, ARCH and GARCH models, cf. Brockwell and
Davis, 1991, Franke et al., 2001). The nonparametric approach guarantees
highest flexibility in modelling, skipping model parameters which otherwise
require extensive diagnostic model testing. To be more precise, the following
models will be investigated in this thesis:

1. {Xn }n is a sequence of independent identically distributed (i.i.d.) random


variables (e.g. with finitely many outcomes) – Chapter 2.1.

2. The conditional distribution of Xn+1 given Xn , ..., X1 (which we will de-


note by PXn+1|Xn ,...,X1 ) is log-normally distributed (i.e. Plog Xn+1|Xn ,...,X1 has
a normal distribution) – Chapter 2.2.
T
3. {log Xn }n is a stationary Gaussian time series (i.e. (log Xn+k , ..., log XnT ))
follows a multivariate normal distribution which depends upon k but not
upon n) – Chapter 3.

4. {Xn }n is a Markov process of order d (i.e., we assume PXn+1 |Xn ,...,X1 =


PXn+1 |Xn ,...,Xn−d+1 ) – Chapters 4 and 5.
1.2 Portfolios and investment strategies 37

5. {Xn }n is a stationary and ergodic time series – Chapter 6.

Each of these models has been found useful for describing asset return data in
real financial markets. Model 1 is the Cox-Ross-Rubinstein model (Cox et al.,
1979; Francis, 1980, A24-1 and A24-2; Luenberger, 1998, Ch. 11; Franke et al.,
2001, Ch. 7). Models 2 and 3 are models with log-normal returns (Francis, 1980,
A24-1; Luenberger, 1998, Ch. 11) which arise, e.g., from a discretisation of the
Black-Scholes model (Luenberger, 1998, Ch. 11; Korn and Korn, 1999, Kap.
II). In contrast to the classical Black-Scholes model we allow for autocorrelated
log-returns in Chapter 3 (i.e. Cov(log Xn , log Xn+k ) 6= 0 for some k > 0). In
practice, autocorrelation of the log-returns manifests itself for small time lags
k (Franke et al., 2001, Ch. 10) as well as large k (long range dependence, Ding
et al., 1993; Peters, 1997). Many studies have indicated that the logarithms
of stock returns slightly depart from a Gaussian distribution (e.g. by heavy
tails, Mittnik and Rachev, 1993; McCulloch, 1996; Franke et al., 2001, Ch. 10
and the references there). It is therefore advisable to drop the assumption of
log-normality of the stock returns wherever possible. This is done in models 4
and 5, model 4 capturing the autocorrelation of stock returns by the Markov
property.
We will assume that the asset returns correspond to one of the models 1-5.
However, we do not assume the exact form of the true return distribution to be
known to the investor (with the exception of Chapter 4). Hence, the investor has
to apply statistical estimation and forecasting techniques for strategy planning.
Clearly, nonparametric models require nonparametric statistical methods and
arguments are usually more involved than in the parametric setting (for an
introduction to nonparametric estimation as we will use it see Györfi et al.
1989, 2002). Unfortunately, nonparametric methods are not yet common in
econometrics and financial mathematics (Pagan and Ullah, 1999, and Franke
et al., 2001, being two of the few notable exceptions). In this thesis we aim to
demonstrate what powerful impetus nonparamtric statistical estimation may
give to investment strategy planning.

1.2 Portfolios and investment strategies


Having chosen a market model, we turn to the actions that may be taken by
38 Chapter 1. Introduction: investment and nonparametric statistics

the investor. Throughout the investment process, the investor holds varying
portfolios of the m assets. Taking a discrete time trading point of view, we
assume that the investor is only allowed to rebalance his portfolio at the be-
ginning but not in the course of each market period. The portfolio held at the
beginning of market period n (i.e. from time n − 1 to n) can be given by the
quantities q1,n−1 , ..., qm,n−1 of the single assets owned by the investor (qi,n−1 < 0
corresponds to borrowed assets, so-called short positions). The investor then
enters the nth market period with a portfolio value of
m
X
+
Wn−1 := Pi,n−1 qi,n−1 .
i=1

The remaining value at the end of the market period is


m
X m
X
Wn− := Pi,n qi,n−1 = Xi,n Pi,n−1 qi,n−1 .
i=1 i=1

+
Hence, if Wn−1 6= 0, the portfolio achieved a return of
m
Wn− X
+ = Xi,n bi,n (1.2.1)
Wn−1 i=1

with
Pi,n−1 qi,n−1
bi,n := Pm .
j=1 Pj,n−1 qj,n−1

Note that m
P
i=1 bi,n = 1, and we will find it more convenient to denote a portfolio
by the portfolio vector

bn := (b1,n, ..., bm,n)T

rather than listing q1,n−1 , ..., qm,n−1 . If the investor is allowed to consume an
amount cn before changing his portfolio bn for bn+1 and entering market period
n + 1, then Wn+ is given by

Wn+ = Wn− − cn . (1.2.2)

(1.2.1) and (1.2.2) are the equations governing general discrete time investment.
Throughout this thesis we are concerned with an investor who neither consumes
nor deposits new money into his portfolio but reinvests his current portfolio
1.2 Portfolios and investment strategies 39
market time n
period n = end of nth day of trading

}
X1 Xn-1 Xn Xn+1 return

0 1 n-1 n n+1 time

b1 bn-1 bn bn+1 portfolio


W0 W1 Wn-1 Wn Wn+1 wealth

Figure 1.1: Setting for the return and portfolio processes.

value in each market period. Hence, cn = 0 for all n and (1.2.1) and (1.2.2) boil
down to n
Y
Wn := Wn+ = Wn− = W0 < bi, Xi >, (1.2.3)
i=1
where Wn is the current wealth of the investor at time n. Moreover, the factor
< bn , Xn >:= bTn Xn can be interpreted as the portfolio return during the nth
market period.
Moreover, we assume that the investor never enters short positions, i.e. bi,n ≥ 0.
Then bi,n is the proportion of the current wealth Wn invested in asset i at time
n − 1. The portfolio vector bn chosen at time n is a member of the simplex

m
( )
X
T
S := (s1 , ..., sm) si = 1, si ≥ 0 .

i=1

The choice of bn depends on the information In which the investor can access at
time n and which he deems relevant. Thus bn = bn (In) where In typically com-
prises a number of past observed asset returns (a substring of X1 , ..., Xn−1), in
some cases additional side or insider information about the market and external
economic factors. For specific choices of In , this is the setting for Chapters 2, 3
and 6 (Figure 1.1).
In reality, the range of portfolio choices is further narrowed by the occurrence of
transaction costs. Each transaction in a real market (purchase, sale of assets)
generates costs (brokerage fees, commission, administrative and communication
expenses). The total amount of these fees is withdrawn from the investor’s
wealth. Thus, the range of portfolios the investor may choose is restricted to
40 Chapter 1. Introduction: investment and nonparametric statistics

those portfolios whose acquisition generates no more transaction costs than


the investor’s current wealth. Then, roughly speaking, the investor is caught
between making as few costly transactions as possible on the one hand and
making as many transactions as necessary to boost his wealth on the other
hand. No wonder that strategic planning under transaction costs requires much
deeper arguments and has received considerable attention in literature for both,
discrete and continuous time models (see e.g. Blum and Kalai, 1999; Bobryk
and Stettner, 1999; Cadenillas, 2000; Bielecki and Pliska, 2000). We shall return
to a typical case of transaction costs in more detail in Chapters 4 and 5.

1.3 Pleading for logarithmic utility


As can be seen from (1.2.3), invested money grows multiplicatively, as a product
of daily returns. Suppose the investor wants to maximize the expected value
of his terminal wealth. If the daily returns {Xi }i are stationary, maximization
of the single expected daily returns is not appropriate. It does not capture
autocorrelation in the returns, since, in general,
n
Y n
Y
E < bi , Xi >6= E < b i , Xi > .
i=1 i=1

The expectation is rather determined by the expectation of the logarithmic daily


returns, since by Taylor expansion (hot. denoting terms of order 2 and higher)
n
Y n
Y n
X
E < bi , Xi >= 1+E log < bi, Xi > +hot. = 1+ E log < bi , Xi > +hot.
i=1 i=1 i=1

It is widely accepted that for returns below 10% and high frequency data (e.g.
daily returns) the logarithmic approximation is convincing (Franke et al., 2001,
Sec. 10.1). This leads to the notion of log-optimal portfolios, i.e. portfo-
lios that maximize the expected logarithmic utility of the investors’s growth
of wealth. The log-optimal portfolio of a process {Xn }n of independent and
identically distributed (i.i.d.) returns Xn is defined as

b∗ := arg max E(log < b, X1 >). (1.3.1)


b∈S

Log-optimal portfolios have been suggested first by Kelly (1956), Latané (1959)
and Breiman (1961) as diversification strategy for investment in a speculative
1.3 Pleading for logarithmic utility 41

market given by a process {Xn}∞ n=1 of i.i.d. return vectors. Since then, nu-
merous investigations, notably by Cover (e.g. Cover, 1980, 1984; Cover and
Thomas, 1991) and Algoet (e.g. Algoet and Cover, 1988) have explored the
theoretical aspects of this strategy, establishing that investment in log-optimal
portfolios yields optimal asymptotic growth rates for the invested wealth. An
introduction, various results and sources of reference can be found in Cover and
Thomas (1991, Chapter 15). There, for stationary and ergodic return processes
{Xn }n , (1.3.1) is generalized by the conditional log-optimal portfolio (for
the nth investment step)

b∗n := arg max E[log < b, Xn > |Xn−1 , ..., X1]


b∈S

in stationary ergodic return processes (conditioning being void for n = 1). The
conditional log-optimal portfolio is the log-optimal portfolio under the condi-
tional distribution PXn |Xn−1 ,...,X1 and hence a random variable. The log-optimal
investment strategy b∗1 , b∗2, ... is a member of the class of non-anticipating
strategies, i.e. sequences of S-valued random variables b1 , b2, ... with the prop-
erty that each bn is measurable w.r.t. the σ-algebra generated by X1 , ..., Xn−1
(hence the strategy requires no more information than available at time n).
The technical aspects of conditional log-optimal portfolios (we will often drop
“conditional” for brevity) are well explored:
Existence and uniqueness of the log-optimal portfolio has been investigated in
Österreicher and Vajda (1993) and Vajda and Österreicher (1994), correcting a
wrong criterion used in Algoet and Cover (1988). The main result is

Theorem 1.3.1. (Vajda and Österreicher, 1994) Let X = (X1 , ..., Xm) be
a stock market return vector with distribution PX . Then there exists a log-
optimal portfolio b∗ ∈ S with |E log < b∗ , X > | < ∞ if and only if

X m
E log Xi < ∞.


i=1

b∗ is unique if PX is not confined to a hyperplane in IRm containing the diagonal


{(d, ..., d) ∈ IRm |d ∈ IR}.

A good algorithm for the calculation of a log-optimal portfolio from the (known)
distribution PX of the return vector X was given by Cover (1984).
42 Chapter 1. Introduction: investment and nonparametric statistics

Theorem 1.3.2. (Cover, 1984) Assume the support of PX is of full dimension


in [0, ∞)m and choose some b0 ∈ S with non-zero entries. Then the recursively
generated portfolio vectors bk = (bk1 , ..., bkm) with
Xi
bk+1
i = bki · E
< bk , X >
converge to the log-optimal portfolio b∗ as k → ∞.

This is closely linked with the following, the Kuhn-Tucker conditions for a log-
optimal portfolio.

Theorem 1.3.3. (Cover and Thomas, 1991, Theorem 15.2.1) A portfolio


vector b∗ = (b∗1 , ..., b∗m) ∈ S is a log-optimal portfolio for the return vector
X = (X1 , ..., Xm) if and only if it satisfies the conditions
(
Xi = 1 if b∗i > 0,
E ∗
< b ,X > ≤ 1 if b∗i = 0.

The superiority of investment according to conditionally log-optimal portfolios


rests upon the following theorem (Algoet and Cover, 1988; Cover and Thomas,
1991, Theorem 15.5.2).

Theorem 1.3.4. (Algoet and Cover, 1988) Assume the return vectors {Xi}∞ i=1
form a stationary and ergodic process. Let Sn∗ := ni=1 < b∗i , Xi > be the wealth
Q

at time n resulting from a series of conditionally log-optimal investements, Sn


the wealth from any other non-anticipating portfolio strategy (both starting
with 1 unit of money). Then
Sn 1 Sn
E ≤ 1 for all n and lim sup log ∗ ≤ 0 with probability 1. (1.3.2)
Sn∗ n→∞ n Sn

The second part of (1.3.2) can be interpreted in various ways:

– It proves that, eventually for large n, Sn < exp(n)Sn∗ whatever  > 0,


which means that no non-anticipating strategy can infinitely often exceed
1.3 Pleading for logarithmic utility 43

the log-optimal strategy by an amount that grows exponentially fast (i.e.


an amount that couldn’t be compensated for by investment in a fixed
interest rate bank account).

– It proves that the log-optimal portfolio will do at least as well as any other
non-anticipating strategy to first order in the exponent of capital growth,
i.e. it guarantees Sn∗ = exp (nW + o(n)) with highest possible rate W .

From the first part of (1.3.2) Bell and Cover (1988) conclude that

– there is no essential conflict between good short-term and long-run per-


formance. Both are achieved by maximizing the conditional expected
log-return.

The log-optimality criterion has not been undisputed, however. In his criticism,
Samuelson (1971, also discussed in Markowitz, 1976) considers a market with
i.i.d. returns X1 , X2 , ... and compares the expected wealth ESn∗ from a series of
log-optimal investments with the expected wealth ESn∗∗ from investment in the
fixed portfolio
b∗∗ := arg max E < b, X1 >
b∈S

(maximization of expected return). Using the independence and the identical


distribution of the returns he finds that
n
ESn∗∗ E ni=1 < b∗∗ , Xi >
Q 
maxb∈S E < b, X1 >
= Qn = → ∞ (n → ∞).
ESn∗ E i=1 < b∗i , Xi > E < b∗1 , X1 >

Hence there are strategies that outperform log-optimal strategies in terms of


expected terminal wealth for long run investment. However, when comparing
the investor’s strategy with a competing strategy we think that the ratio of
wealths considered (1.3.2) is more instructive than two seperate expectations
for the investor’s strategy and the competing strategy. This is a typical example
of criticism offered by classical economists who favour the Markowitz mean-
variance approach to portfolio optimization (Markowitz, 1959; Luenberger,
1998, Ch.6.4ff.). There, the investor seeks to maximize the portfolio perfor-
mance E < b, X > under the constraint of not exceeding a certain threshold
for the risk Var < b, X > (or for the value-at-risk, i.e., quantiles of the return
distribution, in more modern versions of the mean-variance approach).
44 Chapter 1. Introduction: investment and nonparametric statistics

We would like to emphasize that it is not a question of taste whether or not


to use the log-optimal approach. We strongly plead for investment under loga-
rithmic utility because of the following facts:

– In their spirited defence of the log-optimal criterion Algoet and Cover


(1988) come to the conclusion that the mean-variance approach lacks
generality (e.g. for non-log-normally distributed returns, see Samuelson,
1967, and for multiperiod investment, see Luenberger, 1998, Ch. 8.8).

– It is doubtful whether investment analysis should be founded on expecta-


tions (where typically Sn deviates much more from ESn than log Sn from
E log Sn , stabilizing effect of the log-transform). Pathwise results as the
second part of (1.3.2) are more instructive than results on averages.

Realistically, the true distribution of market returns and hence the log-optimal
strategy is not revealed to the investor. Then the key problem is (as Algoet,
1992, put it):
Find a non-anticipating portfolio selection scheme {b̂n}n (a so-called universal
portfolio selection scheme) such that for any stationary ergodic market process
{Xn }n , the compounded capital Ŝn := ni=1 < b̂i , Xi > will grow exponentially
Q

fast almost surely (i.e. with probability 1) with the same maximum rate as under
the log-optimum strategy {b∗n}n , that is, limn→∞ log Ŝn /n = limn→∞ log Sn∗ /n
almost surely.
To obtain a universal portfolio selection scheme, under weak conditions on the
market one may choose the log-optimal portfolio with respect to some appro-
priately consistent estimate of PXn |X1 ,...,Xn−1 in the nth investment step (more
precisely, distribution estimates that almost surely exhibit weak convergence to
the true distribution). This was demonstrated by Algoet (1992, Theorem 7). He
also provides an appropriate, yet complicated estimation scheme (Algoet, 1992,
Theorem 9). Instead, we can also use the more transparent scheme of Morvai et
al. (1996). Algoet points out that there are universal portfolio selection schemes
that do not require an explicit distribution estimation scheme as a subroutine
(Algoet, 1992, Sec. 4.3). But still, all existing algorithms seem to require an
enormous amount of past data, making their feasibility in practical situations
doubtful (as noted, e.g., in Yakowitz, Györfi et al., 1999). More practicable
results have been obtained in the case of independent, identically distributed
return vectors. For instance, Morvai (1991, 1992) and Österreicher and Vajda
1.3 Pleading for logarithmic utility 45

MARKET MODEL
• assets i=1, ..., m
• stationary, ergodic stochastic
INVESTMENT return process {Xn}naþ+m
ACTIONS INVESTMENT GOAL
investment
• portfolio process strategy
{bn}naS • log-utility: maximisation
• transaction costs (occasionally) of expected log-returns
• no consumption • good for both short term
• no short positions and long run investment

Figure 1.2: Our approach to investment strategy planning.

(1993) propose portfolio strategies which are based on selecting the log-optimal
portfolio with respect to the empirical distribution of the data (the so-called
empirical log-optimal portfolio, more on that in Chapter 2). Those esti-
mators can be computed with reasonable effort. Repeated investment following
their strategies asymptotically yields the optimal growth rate of wealth with
probability one. However, in merely stationary and ergodic return processes
they produce suboptimal results.
This thesis aims to provide simplified, yet efficient portfolio selection algorithms
if the log-returns follow a Gaussian process (Chapters 2 and 3), a Markov pro-
cess (Chapters 4 and 5) or, more general, a stationary and ergodic process
(Chapter 6). Our approach is summarized in Figures 1.1 and 1.2.
For the sake of completeness, it should be noted that in recent years, the log-
optimality criterion has been generalized in several ways. In particular, re-
searchers tried

– to make the log-optimality criterion risk sensitive, i.e. to introduce de-


vices which allow the investor to adjust the log-optimal strategy to his
individual risk aversion level. This may be achieved in two different ways:
Either, as in the Markowitz mean-variance model, the investor seeks to
maximize the expected log-return under variance constraints (Ye and Li,
1999), or the log-utility is extended by the variance, e.g. when maximiz-
ing −(2/θ) log E exp(−(θ/2) log Sn ) = E log Sn − (θ/4)Var log Sn + O(θ2 ),
where θ > 0 is a risk aversion parameter (Bielecki and Pliska, 1999,
46 Chapter 1. Introduction: investment and nonparametric statistics

and Stettner, 1999, for continuous time models; Bielecki, Hernández and
Pliska, 1999, for a discrete time model).

– to rid the log-optimality criterion of its stochastic setting, making it appli-


cable to markets with doubtful stochastic properties (Cover, 1991; Cover
and Ordentlich, 1996, including side information; Helmbold et al., 1998,
investigating algorithmic issues and Blum and Kalai, 1999, for the trans-
action cost case).

It is left for future research to generalize the results of this thesis to these
extended models.
47

CHAPTER 2

Portfolio benchmarking: rates and di-


mensionality
Based on market observations, the investor can follow many different empirical
portfolio selection rules (“empirical” being synonymous with “based on histor-
ical return data”). Not all of these necessarily turn out to be a good choice
in view of the investor’s goal. Discriminating between “good” and “bad” port-
folios requires the investor to compare the performance of his portfolio with
the given investment goal. Naturally, “good” empirical portfolio selection rules
should approach the investment goal. It is of serious interest to determine how
fast the investor approaches his goal as more and more information about the
market is gathered. This is the primary task of what we might call “portfolio
benchmarking”. Portfolio benchmarking analyses how well a given portfo-
lio b or a given portfolio selection rule b = b(X1, ..., Xn) of the past returns
X1 , ..., Xn performs with respect to a fixed benchmark, in our case with re-
spect to the expected logarithmic portfolio return in the next market period,
E log bXn+1 (to keep notation simple we write log bXn+1 rather than log bT Xn+1
or log < b, Xn+1 > in the sequel). This, of course, requires a standardized way
to assess to what extent an empirical portfolio selection rule underperforms in
comparison with the log-optimal rule.
In this chapter we analyse how seriously a log-optimal portfolio selection rule
based on an estimate for the true return distribution may underperform. To this
end, we propose a specific measure of underperformance (cf. (2.1.2)). Establish-
ing a lower bound result on this measure, it will be seen that underperformance
cannot vanish at arbitrarily high rate as the investor gathers more and more
knowledge about the market (Theorem 2.1.1). All investors are subject to a
universally limited rate at which investment rules can succeed in exploiting
historical market data.
48 Chapter 2. Portfolio benchmarking: rates and dimensionality

In fact, the empirical log-optimal portfolio of Chapter 1 turns out to be a se-


lection rule that achieves the optimal rate (Theorem 2.1.3). It is particularly
striking that this rate does not depend upon the numbers of stocks included
in the portfolio selection process (Theorem 2.1.4). One is tempted to think
that arbitrarily large portfolios can be handled successfully without extra pre-
cautions. Reasons will be given why this is fallacious and does not obviate the
necessity of trying to keep the dimension of the portfolio at reasonably low level
by pre-selecting a “good” subset of all possible stocks.
However, the pre-selection of stocks is far from being an easy going thing: As we
shall see, there is no way of pre-selecting the stocks on the basis of the perfor-
mance of the single stocks only (Theorem 2.2.1). To find the optimal portfolio
configuration, the investor has to evaluate the log-optimal portfolios of all pos-
sible subsets of stocks and compare the resulting expected logarithmic portfolio
returns, a huge though necessary computational effort in high dimensions.

2.1 Rates of convergence in i.i.d. models


Suppose the m-dimensional stock return vectors X1 , X2, ... constitute a sequence
of independent, identically distributed (i.i.d.) random variables with distribu-
tion Q := PX1 . Q is not disclosed to the investor, who, after n market periods,
may exploit the observations X1 , ..., Xn to obtain a distribution estimate Q̂n of
Q. Let F and F̂n = F̂n (·, X1, ..., Xn) denote the cumulative distribution func-
tions associated with Q and Q̂n, respectively. We shall restrict our analysis to
estimators Q̂n whose sensitivity to outliers is such that


F̂n (x, X1, ..., Xi−1, Xi, Xi+1 , ..., Xn)

c(x, Xi, Xi0)

−F̂n (x, X1, ..., Xi−1, Xi0, Xi+1 , ..., Xn) ≤ (2.1.1)
n

for some function c : IR3m 0 0 m


+ → IR+ , whatever i ∈ IN, x, X1 , ..., Xn, X1 , ...Xn ∈ IR+
may be. Most of the standard distribution estimates share this property, such
as the empirical distribution
n
1X
Q̂n(A) = 1A (Xi ) (A ∈ B(IRm ))
n i=1
2.1 Rates of convergence in i.i.d. models 49

and kernel estimates


n  
1 X x − Xi
Z
Q̂n(A) = K dx (A ∈ B(IRm )),
nhdn i=1 hn
A

hn being a sequence of nonnegative bandwidths and K : IRm → IR+


0 a kernel
function.
Having thus learned a “picture” of the market, Q̂n, the investor allocates his
wealth according to the corresponding log-optimal portfolio

b̂n+1 = b̂n+1 (X1 , ..., Xn) = arg max E log bY,


b∈S

where the expectation is calculated for Y ∼ Q̂n . This choice yields the random
return R̂n = b̂n+1 Xn+1 during the next market period. In order to determine
how well b̂n reproduces the true log-optimal portfolio b∗ = arg maxb∈S E log bX1
with return Rn∗ = b∗ Xn+1 , we first observe that
 
E log Rn∗ − E log R̂n = E log b∗ Xn+1 − E E[log b̂n(X1 , ..., Xn)Xn+1 |X1 , ..., Xn]
 
∗ ∗
≥ E log b Xn+1 − E E[log b Xn+1 |X1 , ..., Xn]
= E log b∗ Xn+1 − E log b∗ Xn+1 = 0,

using the independence of X1 , ..., Xn+1. Hence


Rn∗
∆(Q̂n , Q) := E log ≥0 (2.1.2)
R̂n
with equality if Q̂n = Q. On the other hand, from Theorem 1.3.4,
n
1X R∗
lim inf log i ≥ 0.
n→∞ n R̂i
i=1

Taking expectations and using Fatou’s lemma we obtain


n n
1X R∗ 1X R∗
0 ≤ E lim inf log i ≤ lim inf E log i .
n→∞ n R̂i n→∞ n R̂i
i=1 i=1

Therefore, {b̂n}n is a good portfolio selection rule if


Rn∗
∆(Q̂n , Q) = E log →0
R̂n
50 Chapter 2. Portfolio benchmarking: rates and dimensionality

with high rate as n tends to ∞.


∆(Q̂n , Q) measures underperformance of b̂n w.r.t. the benchmark portfolio b∗ .
In the sequel we will derive asymptotic properties of ∆(Q̂n , Q). The following
theorem shows that the limit cannot be achieved at arbitrarily high speed of
convergence:

Theorem 2.1.1. For any sequence of distribution estimates Q̂n satisfying


(2.1.1), there exists a market distribution Q and a market constant c for which
c
∆(Q̂n, Q) ≥ (2.1.3)
n
for infinitely many n.

As will be seen in the proof, (2.1.1) is not needed when considering unbiased
estimators Q̂n, i.e., estimators for which EQ̂n(A) = Q(A) for all A ∈ B(IRm
+ ).

Proof. Consider a 2 stock market with return vector (X (1), X (2)) ∈ IR2+ and
portfolios (b, 1 − b), b ∈ [0, 1]. We can expand
E log(bX (1) + (1 − b)X (2)) = E log((Z − 1)b + 1) + E log X (2) (2.1.4)
with the return ratio Z := X (1)/X (2). Thus, in a 2 stock market, the log-
optimal portfolio only depends upon the distribution of the return ratio Z. For
simplicity, let Z be of the form
(
A with probability p,
Z= (2.1.5)
B with probability 1 − p
with p ∈ (0, 1), A, B > 0 to be chosen later.
We first consider the classical parameter estimation problem of estimating p,
which will be linked with the portfolio selection problem at a later stage. Q̂n
allows the investor to derive an estimate of p,
n o
p̂n = p̂n (z n) := Q̂n (x(1), x(2)) : x(1)/x(2) = A ,

z n ∈ {A, B}n being the observed realisations of the i.i.d. return ratios Z1 , ..., Zn
(independent of Z). If k(z n) denotes the number of A’s in z n and Bin (p) =
n i n−i

i p (1 − p) denotes the ith Bernstein polynomial of order n, we can identify
 
n  −1 n
X n X  X
fn (p) := Ep̂n (Z1 , ..., Zn) = p̂n (z n) Bin (p) =: bi,nBin (p)
 i 
i=0 n n
z :k(z )=i i=0
2.1 Rates of convergence in i.i.d. models 51

as a Bézier curve. For reasons to become clear later it is important to study its
derivative
n−1
X
0
fn (p) = n(bi+1,n − bi,n)Bin−1 (p).
i=0
Combinatorial arguments given at the end of this proof and relation (2.1.1)
yield
n|bi+1,n − bi,n| ≤ const. (2.1.6)
n−1
P n−1
independently of i and n. Using Bi (p) = 1 we obtain
i=0

|fn0 (p)| ≤ const. (2.1.7)

for all n and p.


We now choose the true parameter p of the model (2.1.5), which we will denote
by p∗ :

– If fn (p) 6→ p as n → ∞ for some p ∈ (0, 1), we fix this p to be the true


parameter p∗ of the distribution of Z.

– If fn (p) → p as n → ∞ for all p ∈ (0, 1), we have


Zp
fn0 (q)dq = fn (p) − fn (p/2) → p/2 (2.1.8)
p/2

as n → ∞. From this there exists a p ∈ (0, 1) with fn0 (p) 6→ 0 (otherwise


(2.1.7) and the Lebesgue dominated convergence theorem lead to a con-
tradiction in (2.1.8). This p is taken to be the true parameter p∗ of the
distribution of Z.

The mean squared error M SE(p̂n) = E(p∗ − p̂n )2+ + E(p∗ − p̂n )2− satisfies
1
E(p∗ − p̂n )2+ ≥ M SE(p̂n ) (2.1.9)
2
or
1
E(p∗ − p̂n )2− ≥ M SE(p̂n ) (2.1.10)
2
for infinetly many n. In either case the Cramér-Rao lower bound yields (for
infinitely many n)
1 fn0 (p∗ )2
 
∗ 2 ∗ ∗ 2
E(p − p̂n )± ≥ + (fn (p ) − p )
2 In (p∗ )
52 Chapter 2. Portfolio benchmarking: rates and dimensionality

with Fisher information In (p∗ ) = np∗ (1 − p∗ ). (Lehmann, 1983, Theorem 6.4).


By the choice of p∗ , the proof is finished if we can adjust A, B such that

∆(Q̂n, Q) ≥ const. · E(p∗ − p̂n )2+

if (2.1.9) applies or

∆(Q̂n, Q) ≥ const. · E(p∗ − p̂n )2−

if (2.1.10) applies. This is done in the following.


If Z is distributed according to the general form (2.1.5), simple calculations
yield that (2.1.4) is maximized by
 
p(A − B) + B − 1
b = b(p) = T −
(A − 1)(B − 1)
where 
 1 if x > 1,

T (x) = x if 0 ≤ x ≤ 1,

 0 if x < 0.

Suppose (2.1.9) holds for infinitely many n. Set A := p∗ and B := 1 + p∗ . Then


b(p∗ ) = 0 and Rn∗ = X (2). In this case
Rn∗
 
E log X1 , ..., Xn = −E[log((Z − 1)b(p̂n) + 1)|X1 , ..., Xn]
R̂n  
= − p∗ log((p∗ − 1)b(p̂n) + 1) + (1 − p∗ ) log(p∗ b(p̂n) + 1) .

More precisely, 
0
 if p̂n > p∗ ,
p∗ −p̂n
b(p̂n) = p∗ (1−p∗ ) if p∗2 ≤ p̂n ≤ p∗ ,
if p̂n < p∗ 2

1

and 
0 if p̂n > p∗ ,
Rn∗
  
E log X1 , ..., Xn = D(p ||p̂n ) if p∗ 2 ≤ p̂n ≤ p∗ ,

R̂n  D(p∗ ||p∗2 ) if p̂ < p∗2

n

with the Kullback-Leibler distances (relative entropies)


p∗ 1 − p∗
D(p∗ ||p̂n ) = p∗ log + (1 − p∗ ) log ,
p̂n 1 − p̂n
D(p∗ ||p∗2 ) = −(p∗ log p∗ + (1 − p∗ ) log(1 + p∗ )).
2.1 Rates of convergence in i.i.d. models 53

The L1 -bound on the Kullback-Leibler distance (Cover and Thomas, 1991,


Lemma 12.6.1, eq. 12.139) yields
2
D(p∗ ||p̂n ) ≥ (p∗ − p̂n )2
log 2
so that
R∗
  
∆(Q̂n , Q) = E E log n X1 , ..., Xn
R̂n
= E D(p ||p̂n )1[p∗ 2 ,p∗ ] (p̂n ) + D(p∗ ||p∗ 2 )1[0,p∗ 2 ] (p̂n )


 
2
≥ E (p∗ − p̂n)2 1[p∗2 ,p∗ ] (p̂n) + D(p∗ ||p∗ 2 )1[0,p∗2 ] (p̂n )
log 2
2 D(p∗ ||p∗2 )
 
≥ min , E(p∗ − p̂n )2+ .
log 2 p∗2

This is what we wanted to show in case (2.1.9) holds. If (2.1.10) applies, we set
A := 2 − p∗ and B := 1 − p∗ and argue similarly.
It remains to prove (2.1.6). For our specific model we can assume
(
(A, 1) with probability p,
X=
(1, 1/B) with probability 1 − p.

If the observed return ratios z n and z 0n ∈ {A, B}n differ in one digit only, so do
the sequences of realisations x1 , ..., xn and x01 , ..., x0n of X that generate z n and
z 0n , respectively. Hence, using (2.1.1),
 
p̂n (z n) − p̂n(z 0n ) = lim Fn((A + , 1 + ), x1, ..., xn) − Fn((A, 1), x1, ..., xn)

→0+
 
0 0 0 0

− Fn ((A + , 1 + ), x1, ..., xn) − Fn ((A, 1), x1, ..., xn)

≤ lim+ Fn ((A + , 1 + ), x1, ..., xn) − Fn ((A + , 1 + ), x01, ..., x0n)

→0

0 0

+ Fn ((A, 1), x1, ..., xn) − Fn ((A, 1), x1, ..., xn)

 
lim→0+ c((A + , 1 + ), x, y) c((A, 1), x, y) c
≤ max + =: .
(x,y)∈{(A,1),(1,1/B)} n n n

Let F(z n ) consist of all elements of {A, B}n which can be generated by changing
exactly one of the digits B in z n to A, and let G(z n) consist of all elements of
54 Chapter 2. Portfolio benchmarking: rates and dimensionality

{A, B}n which can be generated by changing exactly one of the A0 s in z n to B.


Clearly |F(z n )| = n − k(z n) and |G(z n )| = k(z n). From this
 −1  −1
n X n X
bi,n = p̂n (z n) = (n − i)−1 (n − i)p̂n (z n)
i i
z n :k(z n )=i z n :k(z n )=i
 −1
−1 n
X X
= (n − i) p̂n (z n )
i
z n :k(z n )=i z 0n ∈F (z n)
 −1
n X X
= (i + 1)−1 p̂n (z 0n)
i+1 n
z :k(z )=i+1 z ∈G(z )
0n 0n 0n
 −1
n X X
+(i + 1)−1 (p̂n (z n ) − p̂n (z 0n))
i+1
z 0n :k(z 0n )=i+1 z n ∈G(z 0n )
 
 −1
n X 1 X
= p̂n (z 0n) ·  1
i+1 i+1 n
0n 0n
z :k(z )=i+1 z ∈G(z ) 0n

 −1
n X X
+(i + 1)−1 (p̂n (z n ) − p̂n (z 0n))
i+1
z 0n :k(z 0n )=i+1 z n ∈G(z 0n )
 
 −1
 n X X 
= bi+1,n + (i + 1)−1 (p̂n (z n) − p̂n (z 0n )) .
 i+1 n

0n 0n z :k(z )=i+1 z ∈G(z )
0n

The latter bracket {...} is an average with constituents bounded from above in
absolute value by c/n. Hence |bi+1,n − bi,n| ≤ c/n and the proof is finished. 2

Remark. If the analysis is restricted to unbiased estimators in the sense that


EQ̂n (A) = Q(A) for all A ∈ B(IRm + ), in particular fn (p) = p, then we can
choose p∗ = 1/2 to obtain

Rn∗
 
2 1 1 4
E log X 1 , ..., X n ≥ E( − p̂n )2+ ≥ =
R̂n log 2 2 In (1/2) log 2 n log 2

without having to impose (2.1.1).


It is interesting to note that we can bound ∆(Q̂n , Q) in terms of the Kullback-
Leibler distance between Q̂n and Q not only from below but also from above.
This was obtained by Barron and Cover (1988, Theorem 1, see also Cover and
Thomas, 1991, Theorem 15.4.1):
2.1 Rates of convergence in i.i.d. models 55

Theorem 2.1.2. (Cover and Thomas, 1991) Let Q be the true return distribu-
tion and Q̂n a sequence of distribution estimates, both having densities q and
q̂n , respectively, w.r.t. some common dominating measure. Then

∆(Q̂n , Q) ≤ ED(Q||Q̂n)

with the Kullback-Leibler distance


q(x)
Z
D(Q||Q̂n ) = log Q(dx).
q̂n (x)

Remark. As a consequence of Theorem 2.1.2, consistent distribution estimates,


i.e., estimates for which ED(Q||Q̂n ) → 0, generate consistent portfolio selection
rules. The results of Györfi et al. (1994), however, show that convergence in
Kullback-Leibler distance is a considerably strong requirement. There is no
distribution estimate, for example, that is Kullback-Leibler consistent for any
return distribution on a non-finite countable set. The results of Algoet and
Cover (1988) demonstrate that almost sure weak convergence of Q̂n to Q suffices
to obtain consistent portfolio selection rules.

Proof. It suffices to show


Rn∗
 
E log X1 , ..., Xn ≤ D(Q||Q̂n), (2.1.11)
R̂n

the assertion follows taking expectations. If D(Q||Q̂n ) = ∞ there is nothing to


prove. So, assume D(Q||Q̂n ) < ∞, which implies absolute continuity of Q w.r.t.
Q̂n , i.e., Q  Q̂n. Set A := {x : R̂n = b̂nx > 0, q(x) > 0, q̂n (x) > 0}. From the
Kuhn-Tucker conditions (Theorem 1.3.3) it is clear that Q̂n ({x : R̂n > 0}) = 1
and, by Q  Q̂n , also Q({x : R̂n > 0}) = 1. From this and using Q  Q̂n
again, Q(A) = 1. Thus

Rn∗ b∗ x
  Z
E log X 1 , ..., X n = log Q(dx)
R̂n b̂n x
A
 ∗ 
b x q̂n (x) q(x)
Z
= log · · Q(dx)
b̂nx q(x) q̂n (x)
A
56 Chapter 2. Portfolio benchmarking: rates and dimensionality
 ∗ 
b x q̂n (x)
Z
= log · Q(dx) + D(Q||Q̂n )
b̂nx q(x)
A
Z ∗
bx
≤ log Q̂n (dx) + D(Q||Q̂n ) ≤ D(Q||Q̂n ),
b̂n x
A

the latter inequality by the Kuhn-Tucker conditions again. 2


From this theorem we can easily infer that the lower bound on ∆(Q̂n, Q) given
in (2.1.3) is sharp, and that there are estimators attaining the optimal rate of
decay, O(1/n). In particular, the log-optimal portfolio based on the empirical
distribution does so.

Theorem 2.1.3. Assume the return distribution Q is supported on a finite set


M. Let Q̂n be the empirical distribution on M after n observations and
n
1X
b̂n+1 := arg max log bXi
b∈S n i=1

the associated log-optimal portfolio, then


|M|
∆(Q̂n , Q) ≤ . (2.1.12)
n

Proof. Let Γn := {µ|µ empirical distribution of n data on M, D(µ||Q) > }.


Then according to Sanov’s theorem (Cover and Thomas, 1991, Ch. 12)
   
Q D(Q||Q̂n ) >  = Q Q̂n ∈ Γn
 
≤ |M| exp −n min D(Q||µ) ≤ |M| exp(−n).
µ∈Γn

From this we calculate


Z∞   Z∞
|M|
ED(Q||Q̂n ) = Q D(Q||Q̂n ) >  d ≤ |M| exp(−n) = .
n
0 0

Application of Theorem 2.1.2 proves the theorem. 2

It is an interesting feature of inequality (2.1.12) that the rate itself does not
deteriorate when the number m of stocks in the market model grows. This is
2.1 Rates of convergence in i.i.d. models 57

rather untypical of nonparametric estimation problems. Of course, the number


of stocks in the market influences the constant in ∆(Q̂n, Q) = O(1/n): We have
proven that for the empirical log-optimal strategy
Ri∗ c(i)
E log ∼
R̂i i

with c(i) = O(1). To see how c(i) depends on m we allow ourselves some
heuristics: Móri (1982) proved that for a (slight) modification of the empiri-
cal log-optimal strategy, n(m−1)/2 Ŝn /Sn∗ converges to a non-degenerate random
variable Z,
Ŝn
n(m−1)/2 ∗ → Z in distribution.
Sn
This can be rewritten as
n 
R∗

X m−1 1
· − log i → log Z in distribution.
i=1
2 i R̂i

Taking expectations, the left hand side becomes


n n n n
X m−1 1 X R∗ X m − 1 1 X c(i)
· − E log i ∼ · − ,
i=1
2 i i=1 R̂i i=1
2 i i=1
i

both sums being of logarithmic growth. To obtain convergence we infer c(i) ∼


m−1
2 , the constant possibly growing linearly with the number of stocks.

Up to a logarithmic factor, the phenomenon of the rate being insensitive to


m carries over to more sophisticated settings where the return vector is not
necessarily restricted to finitely many outcomes. In particular, we have the
following theorem for the empirical log-optimal portfolio.

Theorem 2.1.4. Assume the return distribution Q is concentrated on a cube


[A, B]m with 0 < A ≤ B < ∞). Let
n
1X
b̂n+1 := arg max log bXi
b∈S n
i=1

be the empirical log-optimal portfolio, then


 q 
log n
∆(Q̂n, Q) = o
n1/2
58 Chapter 2. Portfolio benchmarking: rates and dimensionality

for any q > max{(m − 1)/2, 1}.

Up to the logarithmic factor, the rate coincides with the classical rate n−1/2 of
stochastic parameter estimators – regardless of the portfolio dimension m.
Proof. For the proof we can assume m > 2: If m = 2 we artificially produce
a market with 3 assets from the original 2 stocks and a bond returning A/2
in each market period. In this setting, we never invest in the bond, i.e., log-
optimal investment is the same as in the original 2 stock market. A rate result
for the 3 stock market carries over to the 2 stock market.
First we make some preliminary observations on the covering of the simplex
S := {b ∈ IRm |bi ≥ 0, m
P
i=1 bi = 1}.
0
:= {b ∈ IR |bi ≥ 0, m
m 0
P
Let Sm i=1 bi ≤ 1} and define the mapping F : Sm−1
Pm−1

S : (x1, ..., xm−1) 7→ (x1, ..., xm−1, 1 − i=1 xi). Fix some  > 0. Clearly, we can
0
cover Sm−1 ⊆ [0, 1]m−1 with N ≤ d1/δem−1 k · k∞ -balls of radius δ := /(m − 1)
0
centered at c(1) , ..., c(N ) ∈ Sm−1 . For any x ∈ S

(i)

inf (x1 , ..., xm) − F (c )
i=1,...,N ∞
( m−1 )
(i)
X (i)
= inf max (x1, ..., xm−1) − c , (cj − xj )
i=1,...,N
∞ j=1

(i)

≤ inf (m − 1) (x1 , ..., xm−1) − c ≤ .
i=1,...,N ∞

It follows that S can be covered by at most d(m − 1)/em−1 k·k∞ -balls of radius
.
Let X1 , ..., Xn denote independent return data and augment this family by a
random variable X independent of X1 , ..., Xn with the same distribution as X1 .
Introduce the following abbreviations:
n n
1X 1X
Φn := max log bXi = log b̂n Xi ,
b∈S n i=1 n i=1
Ln := E[log b̂n X|X1 , ..., Xn]

and

L∗ := max E log bX = E log b∗ X.


b∈S
2.1 Rates of convergence in i.i.d. models 59

Clearly,
Ln ≤ max E[log bX|X1 , ..., Xn] = max E log bX = L∗ .
b∈S b∈S

To bound the tail probability of L∗ −Ln ≥ 0 we use the following decomposition

P(L∗ − Ln > ) = P(L∗ − Φn + Φn − Ln > )


≤ P(L∗ − Φn ≥  − δ) + P(Φn − Ln ≥ δ)
=: K1 + K2 ,

where δ > 0 is chosen later.


Bounding K1: We start with

n n
∗ 1X 1X
L − Φn = max E log bX − max log bXi ≤ max E log bX − log bXi .

b b n b n
i=1 i=1

For τ > 0 to be chosen later, cover S by N ≤ d(m − 1)/τ em−1 k · k∞ -balls of


radius τ centered at b(1), ..., b(N ). Choosing c1 such that | log bX − log b̃X| ≤
c1 |b − b̃| for all X ∈ [A, B]m we obtain

n n
1X
(j) 1X (j)

max E log bX − log bXi ≤ max E log b X − log b Xi + 2c1 τ.

b n
i=1
j=1,...,N n i=1

Under the condition  − δ − 2c1 τ > 0, the Hoeffding inequality (Petrov, 1995,
2.6.2, note that | log bX| ≤ max{| log A|, | log B|} =: c2 < ∞) yields
N
n
!
X 1 X
K1 ≤ P E log b(j)X − log b(j) Xi ≥  − δ − 2c1 τ

j=1
n i=1

n( − δ − 2c1 τ )2
 
≤ 2N exp −
2c22
m−1
n( − δ − 2c1 τ )2
  
m−1
≤ 2 exp − . (2.1.13)
τ 2c22

Bounding K2:
n
1X
Φn − Ln = log b̂nXi − E[log b̂n X|X1, ..., Xn]
n i=1
 
≤ max log b(j) Xi − E log b(j) X + 2c1 τ,
j=1,...,N
60 Chapter 2. Portfolio benchmarking: rates and dimensionality

and under the condition δ − 2c1 τ > 0 we apply Hoeffding’s inequality again to
obtain
m−1
n(δ − 2c1 τ )2 n(δ − 2c1 τ )2
    
m−1
K2 ≤ N exp − ≤ exp − .
2c22 τ 2c22
(2.1.14)
Combining (2.1.13) and (2.1.14) for δ := /2, τ := /(8c1 ) yields
m−1
n2
  
∗ 8c1 (m − 1)
P(L − Ln > ) ≤ 3 exp − 2 .
 32c2
This in turn implies
Z∞

∆(Q̂n, Q) = E(L − Ln) = P(L∗ − Ln > ) d
0
Z∞  m−1
n2
 
8c1 (m − 1)
≤ an + 3 exp − 2 d
 32c2
an
 
Z∞  m−1
1
exp −c3 na2n z dz 
 
= an 1 + 24c1 (m − 1)

an z
(8c1 (m−1))−1

with c3 := 2c21 (m − 1)2 /c22 and an > 0 still to be adjusted.


We now have to bound integrals of the latter type: For a > 0, m > 2 and
an := cn−1/2 logq n (arbitrary c > 0, q > (m − 1)/2) we find that
Z∞  m−1 Z∞  
1 2 m−2 1
exp(−c3 nan z) dz ≤ 2 + 1 exp(−c3 na2n z) dz
an z (anz)m−1
a a

where we have used the inequalities dxe ≤ x + 1 and Jensen’s inequality to


obtain (x + 1)m−1 = 2m−1 ((x + 1)/2)m−1 ≤ 2m−1 (xm−1 + 1)/2 = 2m−2 (xm−1 + 1).
Bounding the last integral yields
Z∞  m−1
1
exp(−c3 na2n z) dz
an z
a
Z∞ Z∞
m−2 1 1
≤2 exp(−ac3 na2n ) m−1 dz + 2 exp(−c3 na2n z) dz
m−2
z m−1
an
a 0
 m−2  1/2
2 exp(−ac3 na2n) π 1
= + 2m−3 .
a (m − 2)am−1
n c3 n1/2 an
2.2 Dimensionality in portfolio selection 61

For n sufficiently large


   (m−1)/2
2q m−1 1
exp(−ac3 na2n) 2
= exp(−ac3 c log n) ≤ exp − log n = ,
2 n

and thus m−1


exp(−ac3 na2n)

1 1
≤ ≤ .
am−1
n c logq n c logq n
By the choice of an , 1/(n1/2 an ) = 1/(c logq n), and we end up with

c logq n const.
 
const.
∆(Q̂n, Q) ≤ an 1 + q = + 1/2
c log n n1/2 n

for all n greater than some integer that depends on m, a, c3 and c. Hence,

n1/2
lim sup ∆(Q̂n, Q) ≤ c
n→∞ logq n
and from c > 0 being arbitrary, we infer

n1/2
lim sup ∆(Q̂n , Q) = 0,
n→∞ logq n
the assertion for the case m > 2. 2

2.2 Dimensionality in portfolio selection


Let the investor operate in a market of M stocks with random one day re-
turns X (i) (i = 1, ..., M ). Typically, M is large, e.g., M = 30 for the DAX
(Frankfurt) or Dow Jones IA (New York) stocks, M = 100 for the FTSE100
(London). Common wisdom tells us “don’t put all your eggs in one basket”, the
economist’s version of this saying (as Samuelson, 1967, put it) goes “diversifi-
cation pays”. One is tempted to think the more diversified the portfolio, i.e.,
the more stocks we include in via log-optimal portfolio selection, the better.
The results of the last section, where we have seen that the number of stocks
does not affect the rate at which empirical log-optimal portfolios approach the
optimal performance, make us particularly optimistic. However, we should not
forget that there are also several reasons to avoid selection from a huge set of
stocks:
62 Chapter 2. Portfolio benchmarking: rates and dimensionality

1. M very much affects the scale of finite sample underperformance via the
constants in the rate results (recall what we infered from Móri, 1982).

2. Standard optimisation methods are computationally demanding in high


dimension.

3. If the log-optimal portfolio is calculated with, e.g., Cover’s algorithm


(Cover, 1984), then at each iteration step an M -dimensional integration
has to be carried out, which requires considerable computational effort.
Also, Cover’s algorithm requires exact knowledge of the M -dimensional
return distribution. In practice, such information must be gathered by
statistical distribution estimation which faces substantial difficulties for
high dimension M (curse of dimensionality, see e.g. Scott, 1992, Chapter
7).

For these reasons, the investor should work with a medium size range of stocks
at a time only. In other words, he will have to pre-select m < M stocks from
the whole market. These pre-selected stocks are the assets he includes in a
log-optimal portfolio. For illustrative purposes, we restrict ourselves to M = 3.
In this case, the investor may compose a log-optimal portfolio out of 6 possible
combinations of 1 or 2 stocks.

V{n} := E log X (n)

is the maximal expected log-return when the portfolio is composed of stock n


only. If two different stocks n and m are considered, the maximal expected
log-return is  

V{n,m} := max E log (1 − b)X (n) + bX (m) .
0≤b≤1

A natural (and in fact a frequently used) way for pre-selection is to start with
a first “draught-horse” stock (say stock A) for our portfolio, i.e., a stock such

that V{A} is large. From the two remaining contenders (say stocks B and C) the

investor then includes the one with good single performance, e.g. B if V{B} >
∗ ∗ ∗
V{C} . The hope is to attain the optimum V{A,B} = max{n,m}⊆{A,B,C} V{n,m} .
The following result gives conditions under which this method is doomed to
failure in the realistic market model of log-normally distributed returns. More

precisely, markets with log-normal returns are characterised for which V{1} <
∗ ∗ ∗ ∗
V{2} < V{3} and V{2,3} < V{1,2} at the same time, the two best single stocks
2.2 Dimensionality in portfolio selection 63

forming a poorer portfolio than the two worst stocks in the market. As a
consequence, in order to select the optimal 2 stock combination from the market,
the investor has to evaluate all M

m possible choices, a huge computational effort
in high dimensions – a effort though that cannot be avoided. This is in contrast
to the Markowitz mean-variance approach, where a portfolio built of stock 1
and 2 may be superior to a portfolio built of stock 2 and 3 in terms of risk
(i.e., variance of portfolio return), never though in terms of performance (i.e.,
expected portfolio return).

Theorem 2.2.1. Consider a given variance-covariance matrix


 
σ12 σ12 σ13
Σ =  σ12 σ22 σ23  .
 
σ13 σ23 σ32

Then the condition


σ32 − 2σ23 < σ12 − 2σ12 (2.2.1)
is necessary and sufficient for a three stock log-normal market
 
X (i) := exp(Y (i)) (i = 1, 2, 3) with Y (1), Y (2), Y (3) ∼ N ((µ1 , µ2 , µ3 ) ; Σ)

to exist such that

∗ ∗
µ1 < µ2 := 0 < µ3 and V{2,3} < V{1,2}

simultaneously.

The assertion of Theorem 2.2.1 remains valid if a common µ ∈ IR is added to


µ1 , µ2 and µ3 .
The theorem implies that single stock performance is of secondary importance
in comparison with harmonious teamwork of the stocks. The deeper reason for
this is the effect of “volatility pumping” (Luenberger, 1998, Examples 15.2 and
15.3): The specific volatility structure (i.e., covariance structure) in the market
may “pump” growth from one stock to others in the portfolio. In our example, if
the covariance σ12 of stock 1 and stock 2 is sufficiently less than the covariance
of stock 1 and stock 3 – preferably sufficiently negative such that whenever 1
plunges, 2 is likely to increase – then more substantial growth can be achieved
64 Chapter 2. Portfolio benchmarking: rates and dimensionality

by balancing stocks 1 and 2 rather than stocks 1 and 3, even though stock 2
might have poorer single performance than stock 3.
Corresponding results for dimension reduction in pattern recognition have been
obtained by Touissaint (1971, also discussed in Devroye, Györfi and Lugosi,
1996, Theorem 32.2).
For the proof of the theorem, we need a number of preliminary observations.
First consider a 2 stock market with log-normally distributed returns X (i),

X (i) := exp(Y (i)) (i = 1, 2)

and ! ! !!
Y (1) µ1 σ12 σ12
∼N ; .
Y (2) µ2 σ12 σ22
Log-optimal investment in log-normal markets has been considered, e.g., by
Ohlson, (1972). As noted in Chapter 1, the log-optimal portfolio
 
b(1)∗, b(2)∗ := arg min E log(b(1)X (1) + b(2)X (1))
b(1) ,b(2) ≥0,b(1) +b(2) =1

satisfies the necessary and sufficient Kuhn-Tucker conditions


( (
X (i) =1 >0
E (1)∗ (1) if b(i)∗
b X + b(2)∗X (2) ≤1 =0

(Theorem 1.3.3). In other words

X (2)
(1, 0) log-optimal ⇐⇒ E ≤1 (2.2.2)
X (1)
(1)
X
(0, 1) log-optimal ⇐⇒ E (2) ≤ 1 (2.2.3)
X
(1) (2) X (1)
(b , b ) log-optimal ⇐⇒ E (1) (1)
b X + b(2)X (2)
X (2)
= E (1) (1) = 1, (2.2.4)
b X + b(2)X (2)

the latter for b(1), b(2) > 0, b(1) + b(2) = 1.


We rephrase the Kuhn-Tucker conditions in a form that will be more convenient
to use. To this end we define Z := Y (2) − Y (1) ∼ N (µ2 − µ1 ; σ12 − 2σ12 + σ22)
and connect b(1), b(2) 6= 0 to r ∈ (0, ∞) via r := b(2)/b(1), b(1) = 1/(1 + r) and
2.2 Dimensionality in portfolio selection 65

b(2) = r/(1 + r). Then we can rewrite the right hand sides of (2.2.2) and (2.2.3)
as
E exp Z ≤ 1, (2.2.2’)
E exp(−Z) ≤ 1. (2.2.3’)
By simple calculations, the right hand side of (2.2.4) is equivalent to the exis-
tence of
exp Z − 1 (2.2.4’)
r ∈ (0, ∞) such that E = 0.
1 + r exp Z
From (2.2.2’) to (2.2.4’) one can observe the following:

1. The log-optimal portfolio b(1)∗, b(2)∗ only depends upon




µ := µ2 − µ1 and σ 2 := σ12 − 2σ12 + σ22 ,

i.e. b(1)∗, b(2)∗ = b(1)∗(µ, σ 2 ), b(2)∗(µ, σ 2 ) .


 

2. Evaluating E exp Z = exp(µ + σ 2 /2) and E exp(−Z) = exp(−µ + σ 2 /2)


yields:
σ2
(1, 0) log-optimal ⇐⇒ µ ≤ − , (2.2.5)
2
σ2
(0, 1) log-optimal ⇐⇒ µ ≥ . (2.2.6)
2
3. For µ = 0, (1/2, 1/2) is the log-optimal portfolio, since by symmetry
Z∞
z2
 
1 exp z − 1
√ exp − 2 dz = 0.
2πσ exp z + 1 2σ
−∞

The value of the log-optimal portfolio of stock 1 and stock 2 is

V ∗ := E log(b(1)∗X (1) + b(2)∗X (2)).

However, it will be convenient to work with the portfolio improvement

Vσ (µ) := V ∗ − µ1 = E log X (1) + E log(b(1)∗ + (1 − b(1)∗) exp Z) − µ1


= E log(b(1)∗(µ, σ 2 ) + (1 − b(1)∗(µ, σ 2)) exp Z) (2.2.7)

achieved when including stock 2 in a portfolio of stock 1. Vσ (µ) only depends


upon the distribution of

Z = Zµ,σ2 ∼ N (µ; σ 2),


66 Chapter 2. Portfolio benchmarking: rates and dimensionality

i.e. again on µ and σ 2 only. The following lemma summarizes basic properties
of Vσ (µ), similar to the results derived in Ohlson (1972):

Lemma 2.2.2. 1. For any fixed σ ∈ [0, ∞) Vσ (µ) is a continuous function of


µ ∈ IR, strictly increasing on [−σ 2/2, ∞).
2. Vσ (µ) = 0 for all µ ∈ (−∞, −σ 2/2] and Vσ (µ) = µ for all µ ∈ [σ 2/2, ∞).
3. Vσ (0) is a nonnegative, strictly increasing continuous function of σ ∈ [0, ∞).

Proof. 1. On the one hand the log-optimal portfolio b(1)∗(µ, σ 2), b(2)∗(µ, σ 2) is


unique (Theorem 1.3.1), on the other hand, a continuous solution, say b(1)(µ, σ 2),
b(2)(µ, σ 2 ) , to the maximization problem


E log(b(1)X (1) + b(2)X (2)) = µ1 + E log(b(1) + b(2) exp Zµ,σ2 ) −→ max !


b(1) ,b(2) ≥0,b(1) +b(2) =1

can be found (Aliprantis and Border, 1999, Theorem 16.31 together with Lemma
16.6). Hence, both coincide and b(1)∗(µ, σ 2 ) is a continuous function of µ. From
the equation Vσ (µ) = E log(b(1)∗ + (1 − b(1)∗) exp Zµ,σ2 ) the continuity assertion
follows.
Now, let −σ 2/2 < µ < ν. Then

V = E log(b(1)∗(µ, σ 2 ) + (1 − b(1)∗(µ, σ 2)) exp Zµ,σ2 )


< E log(b(1)∗(µ, σ 2 ) + (1 − b(1)∗(µ, σ 2)) exp Zν,σ2 )
≤ E log(b(1)∗(ν, σ 2 ) + (1 − b(1)∗(ν, σ 2)) exp Zν,σ2 ).

The first inequality follows from −σ 2/2 < µ, i.e. b(1)∗(µ, σ 2 ) < 1, the second
inequality holds by definition of b(1)∗ as a component of the log-optimal portfolio.
2. is a direct consequence of (2.2.5) and (2.2.6). We just check Vσ (µ) =
E log X (1) − µ1 = µ1 − µ1 = 0 for b(1)∗, b(2)∗ = (1, 0) and calculate Vσ (µ) =


E log X (2) − µ1 = µ2 − µ1 = µ for b(1)∗, b(2)∗ = (0, 1).




3. As noted above µ = 0 implies b(1)∗, b(2)∗ = (1/2, 1/2). Hence we find that


1
Vσ (0) = log + E log(1 + exp Z0,σ2 )
2
Z∞
w2
 
1 1
= log + √ log(1 + exp(σw)) exp − dw.
2 2π 2
−∞
2.2 Dimensionality in portfolio selection 67

From this representation we see that Vσ (0) is continuous for σ ∈ (0, ∞). More-
over, using the monotone convergence theorem (Williams, 1991, 5.3) we calcu-
late
Z∞
w2
 
1 1 1
V0 (0) := lim+ Vσ (0) = log + √ log 2 exp − dw = log + log 2 = 0.
σ→0 2 2π 2 2
−∞

Concerning monotonicity, we remark that in what follows the interchange of


differentiation and integration is possible by the standard theorem for integrals
depending on a parameter (see e.g. Williams, 1991, A.16.1). Thus for σ > 0
Z∞
w2
 
∂ 1 w
Vσ (0) = √ exp − dw.
∂σ 2π 1 + exp(−σw) 2
−∞

Finally, since w/(1 + exp(−σw)) > w/2 for all w 6= 0,


Z∞
w2
 
∂ 1 w
Vσ (0) > √ exp − dw = 0,
∂σ 2π 2 2
−∞

and Vσ (0) is strictly increasing for σ ≥ 0. 2


We are now in the position to finish the
Proof of Theorem 2.2.1. Sufficiency: Suppose we are given σL2 := σ32 −2σ23 +
σ22 < σU2 := σ12 − 2σ12 + σ22. Then, from part 3 (combined with parts 1, 2) of
Lemma 2.2.2, we can choose a W > 0 with

VσL (0) < W < VσU (0). (2.2.8)

By parts 1 and 2 of the lemma and the intermediate value theorem µ1 :=


Vσ−1
U
(W ) and m := Vσ−1
L
(W ) are well-defined and – observing the strict mono-
tonicity property of VσL and VσU as in part 1 of the lemma – we obtain µ1 <
0 < m from (2.2.8).
This choice is illustrated in Figure 2.1, where the reduced portfolio values Vσ (µ)
were calculated by means of the Cover algorithm (Theorem 1.3.2) and numerical
integration (composite trapezoidal rule, Isaacson and Keller, 1994, Sec. 7.5)
both with an accuracy of 10−7 .
Choose some µ3 with 0 < µ3 < m. Then, by part 1 again,

VσL (µ3 ) < VσL (m) = W = VσU (µ1 ).


68 Chapter 2. Portfolio benchmarking: rates and dimensionality

0.6

âU = 1.00000
âL = 0.50000
W = 0.08940
0.4 Ü1 = - 0.05000
m = 0.08666
Vâ (Ü)
Vâ (·)
U

0.2
Vâ (·)
L

-0.2
-0.6 âU2 -0.4 -0.2 â L2 Ü1 0 mâ L
2
0.2 0.4 âU 2 0.6
2 2 2 2
Ü

Figure 2.1: An example of the situation as in the proof of Theorem 2.2.1.

Combining this with the definition of the reduced portfolio value Vσ (µi − µj ) =

V{i,j} −µj in (2.2.7) for a portfolio of stocks {i = 1, j = 2} (set σ 2 = σ12 −2σ12+σ22 )

and of stocks {j = 2, i = 3} (set σ 2 = σ32 − 2σ23 + σ22), we obtain V{2,3} =

VσL (µ3 ) + 0 < VσU (µ1 ) + 0 = V{1,2} .
Necessity: If instead of (2.2.1) we assume σ32 −2σ23 ≥ σ12 −2σ12 , then µ1 < µ2 :=
∗ ∗
0 < µ3 implies V{1,2} = VσU (µ1 ) + 0 ≤ VσU (0) ≤ VσL (0) < VσL (µ3 ) + 0 = V{2,3}
(Lemma 2.2.2, parts 1 and 2 for the first and third inequality, part 3 for the
second). 2

2.3 Examples
We conclude this chapter with examples of a real market where a situation as in
Theorem 2.2.1 is set up by empirical market data. We assume that the distribu-
tion of the returns is log-normal with the parameters provided by the standard
2.3 Examples 69

estimates of mean and variance (thus admittedly oversimplifying things). Fig-


ure 2.2 a-c) shows some diagnostic checks run on American Express Co. (AXP)
daily log-return data from the closing prices at the New York Stock Exchange
2/1/1998 - 30/11/2000 (data from www.wallstreetcity.com).
Comparing the histogram of the data in Figure 2.2 a) and the normal density
with estimated mean and variance, it should be possible to approximately ex-
plain the data by assuming a log-normal return distribution. This is supported
by the normal probability plots of the log-return data in 2.2 b): lines a,b and
c correspond to the normal probability plots from 245 consecutive data each,
plot b being moved to the right by 0.05, c by 0.1. 95% confidence bands are
shown. The lines a,b and c being roughly parallel there is no alarming sign that
the return distribution loses stationarity during the period under investigation.
The sample autocovariance function in 2.2 c) (95% confidence bands around
zero) suggests approximately uncorrelated (in the Gaussian case independent)
day to day data.
Example 2.1: The following 3 stocks are chosen from the range of the Dow
Jones Industrial Average.

i stock estim.: µi − µ2 σi2 − 2σi2 + σ22


−4
1 American Express Co. (AXP) −0.4503008 · 10 5.1832410 · 10−4
2 Citigroup Inc. (C) 0.0000000 0.0000000
−4
3 United Technologies Corp. (UTX) 0.1215608 · 10 8.4458413 · 10−4

The third column reports estimates based on the empirical mean and variance
of the difference Y (i) − Y (2) of the log-returns of stock i and stock 2.
Suppose we want to enhance a portfolio of C by either AXP or UTX. Since
σ12 − 2σ12 + σ22 < σ32 − 2σ23 + σ22 we conclude that there is no indication that we
should prefer AXP to UTX.
Example 2.2: Next, consider the following stocks from the Dow Jones Trans-
portation Average:

i stock estim.: µi − µ2 σi2 − 2σi2 + σ22


−4
1 J.B. Hunt Transp. Serv. (JBHT) −0.4207553 · 10 17.5748915 · 10−4
2 Yellow Corp. (YELL) 0.0000000 0.0000000
3 Union Pacific Corp. (UNP) 0.3721663 · 10−4 10.7871249 · 10−4
70 Chapter 2. Portfolio benchmarking: rates and dimensionality

15
relative frequency [%]
10
5
0

-0.10 -0.05 0.00 0.05 0.10


data
2.2 a) Histogram and estimated normal density.

99
a b c
95
90
80
70
percentage

60
50
40
30
20
10
5

-0.10 -0.05 0.00 0.05 0.10 0.15


data
2.2 b) Normal probability plot.
2.3 Examples 71

7.5

autocovariance [ 10 -4 ]
5.0

2.5

0.0

0 10 20 30 40 50
lag
2.2 c) Sample autocovariance function.

Figure 2.2: Diagnostic plots for American Express Co. (AXP) log-returns from
closing prices NYSE, 2/1/1998-30/11/2000.

If we want to include either JBHT or UNP in a portfolio of YELL shares, we


observe that σ12 −2σ12 +σ22 > σ32 −2σ23 +σ22 . Hence it may be advisable to choose
JBHT as an additional stock in spite of its apparently poorer performance.
Indeed, calculating the log-optimal portfolio for the alternatives we obtain:

∗ ∗
additional stock j weight b of YELL V{2,j} − V{2} residual value
1 (JBHT) 0.523951 1.9910 · 10−4 −1.6092 · 10−10
3 (UNP) 0.465490 1.5407 · 10−4 1.3154 · 10−10

∗ ∗
V{2,j} − V{2} is the improvement of the portfolio value achieved under inclusion
of stock j. As can be seen our suspicion was justified: Choosing JBHT yields
(slightly) greater portfolio improvement than UNP. The residual value in the
fourth column is 1 − EX (2)/(bX (2) + (1 − b)Xj ) and indicates that the Kuhn-
Tucker condition (2.2.4) for log-optimality of the stated portfolio weight b is
72 Chapter 2. Portfolio benchmarking: rates and dimensionality

satisfied. The values in the third and fourth column were computed with an
error of at most 10−9 using the composite trapezoidal rule.
73

CHAPTER 3

Predicted stock returns and portfolio se-


lection
In the last chapter we have seen that portfolio selection on the basis of single-
stock performances alone is problematic. Information about the variance-cova-
riance structure of the stock returns in the market is indispensable. Of course,
there are several ways to incorporate such information. For example, in a
log-normal market, an investor might simply use an estimate of the variance-
covariance matrix of the return distribution (conditioned on the past) and run
Cover’s algorithm (Theorem 1.3.2). Another investor might first try to use
market observations to get an idea how the stock returns in the market are
correlated and what temporal correlation prevails. Given this knowledge, he
then produces forecasts of the stock returns for the next market period and re-
arranges his portfolio according to these forecasts – typically in a “greedy” way,
i.e. trying to pick out the return maximal stock for the next market period.
Clearly, the conditional log-optimal portfolio depends on much more than mere
forecasts of the returns in the next market period. Therefore, an investor follow-
ing the “greedy” strategy is bound to lose out in comparison with the investor
using conditional log-optimal portfolios. But still, the greedy strategy is pop-
ular among investors (the forecasting bit typically being much of a heuristics)
and should therefore be analysed thoroughly. This is the task of this chapter.
In Section 3.1 we formalize the greedy strategy, which gives rise to two questions:
How suboptimal can the strategy be in comparison with log-optimal portfolio
selection, and, what is a reasonable way to manage the forecasting part? As to
the first question, we will see that suboptimality can be bounded in terms of
the variance of the logarithmic returns (cf. (3.1.4)). In view of this, the greedy
strategy is appealing in markets with sufficiently small stochastic fluctuation –
“sufficiently small” depending on how much the investor is prepared to lose out
74 Chapter 3. Predicted stock returns and portfolio selection

in investment performance.
We will embark on the second question, the forecasting problem, in Section 3.2.
Once we have seen that among the many ways of forecasting, so-called “strong”
forecasting is the method of choice for the greedy strategy, we will leave the stock
market behind and handle the problem in the general framework of Gaussian
time series forecasting. Clearly, the prediction of stationary Gaussian time series
is of interest very much in its own right, with applications arising in many fields.
Based on an approximation argument (Section 3.2.1), a forecasting algorithm
will be presented (Section 3.2.2) that – under weak regularity conditions –
is strongly consistent for huge classes of Gaussian processes (Theorem 3.2.2).
Explicit examples are given, highlighting how general the algorithm is (examples
after Corollary 3.2.3). The results are proved in Section 3.3. Simulations and
further examples in Section 3.4 conclude the chapter.

3.1 A strategy using predicted log-returns


To avoid unnecessary technicalities, we choose the simplest setting of a mar-
ket with one stock (with returns ..., X−1, X0 , X1 , X2, ...) and one risk-free asset
(bond with return r). Is it not hard though to develop analogous techniques
for more general markets.
The log-transform has been found to have a stabilizing effect on the return data
Xi in so far as Yi := log Xi follows a symmetric distribution (around the mean)
in many real markets. For this reason, we will use Yi rather than Xi in the
following. Under full knowledge of the process past Yn, Yn−1 , ..., the investor is
in principle advised to invest the log-optimal proportion
b∗ := arg max E[log(b exp(Yn+1) + (1 − b)r)|Yn, Yn−1 , ...] (3.1.1)
b∈[0,1]

of his wealth in the stock. However, the non-institutional (private) investor


typically takes a different stance in two respects:

– His main interest is simply to determine whether or not he should invest a


given amount of wealth in the specific stock (rather than to determine
what proportion to invest where).

– He takes the investment decision on the basis of the predicted return of the
next market period only.
3.1 A strategy using predicted log-returns 75

In particular, he may try to achieve the maximum possible return max{Xn+1 , r}


in the next market period with the following “greedy strategy”.

1. At each step n he produces an estimate Ŷn+1 for the next outcome Yn+1 on
the basis of the observed Yn, ..., Y1 (note that it is not possible to observe
the process to the infinite past).

2. He invests according to
(
1 if exp(Ŷn+1 ) ≥ r,
b∗approx := (3.1.2)
0 otherwise.

If the portfolio is not rearranged on a daily basis, but say, on a two month basis,
this is a typical example of a buy-and-hold strategy: Once a fixed amount of
money is invested according to the investor’s belief what the market will look
like in two month’s time, the portfolio remains unchanged. A similar “greedy”
buy-and-hold strategy (where only stocks from the CBS index are picked whose
predicted two month return exceeds a certain threshold) has been investigated
in a case study described in Franke et al. (2001, Sec. 16.4 and the references
there).
Comparing b∗approx = arg maxb∈[0,1] log(b exp(Ŷn+1) + (1 − b)r) and (3.1.1), we
see that the greedy strategy is very much in the spirit of approximating the
log-optimality principle. However, b∗ will be a function of Yn, Yn−1 , ... rather
than of a single statistic Ŷn+1, and the log-optimal portfolio will be diversified
(i.e., b∗ ∈ [0, 1]), not just 0 or 1. As a consequence the investor loses out
on investment performance in comparison with the log-optimal portfolio. Two
questions arise:

– How should he construct the statistic Ŷn+1 in order not to lose out “too
much” in comparison with log-optimal portfolio performance?

– What performance loss does the particular Ŷn+1 inflict on the investor in
the worst possible case?

These are the two problems analysed in this section.


If we want to approximate the log-optimal b∗ on the basis of nothing else but
Ŷn+1 , we need to find an approximation g of the target function,

E[log(b exp(Yn+1 ) + (1 − b)r)|Yn, Yn−1, ...] ≈ g(b, Ŷn+1, r).


76 Chapter 3. Predicted stock returns and portfolio selection

Not every g and not every Ŷn+1 will be appropriate for this kind of approxima-
tion. A Taylor expansion may give us some guideline: Put
fb (y) := log(b exp(y) + (1 − b)r)
and note that
1
0 ≤ fb0 (y) = ≤ 1,
1 + (1/b − 1)r exp(−y)
(1/b − 1)r exp(−y) 1
0 ≤ fb00 (y) = ≤ .
(1 + (1/b − 1)r exp(−y))2 4
Now consider the expansion
1
fb (Yn+1 ) = fb (Ŷn+1 ) + fb0 (Ŷn+1)(Yn+1 − Ŷn+1) + fb00 (ξb )(Yn+1 − Ŷn+1 )2 (3.1.3)
2
with some random ξb from the convex hull of Yn+1 and Ŷn+1. From (3.1.3) and
the σ(Yn, Yn−1, ...)-measurability of Ŷn+1 we obtain
E[fb (Yn+1 )|Yn, Yn−1, ...] = fb (Ŷn+1 ) + fb0 (Ŷn+1)(E[Yn+1|Yn , Yn−1, ...] − Ŷn+1 )
1
+ E[fb00(ξb )(Yn+1 − Ŷn+1 )2 |Yn, Yn−1 , ...].
2
As can be seen, the choice
Ŷn+1 := E[Yn+1|Yn, Yn−1 , ...]
not only makes the first order term vanish but also mimimizes the upper bound

1
E[fb00 (ξb)(Yn+1 − Ŷn+1)2 |Yn , Yn−1, ...] ≤ 1 E[(Yn+1 − Ŷn+1)2 |Yn , Yn−1, ...]

2 8
on the second order term.
Using b∗approx (based on Ŷn+1 ), the investor loses at most
E[fb∗ (Yn+1) − fb∗approx (Yn+1)|Yn , Yn−1, ...]
= E[fb∗ (Yn+1) − fb∗approx (Ŷn+1)|Yn , Yn−1, ...]
+E[fb∗approx (Ŷn+1) − fb∗approx (Yn+1)|Yn, Yn−1 , ...]
≤ E[fb∗ (Yn+1) − fb∗ (Ŷn+1)|Yn, Yn−1 , ...]
+E[fb∗approx (Ŷn+1) − fb∗approx (Yn+1)|Yn, Yn−1 , ...]
1
= E[(fb00∗ (ξb∗ ) − fb00∗approx (ξb∗approx )) · (Yn+1 − Ŷn+1)2 |Yn , Yn−1, ...]
2
1
≤ E[(Yn+1 − Ŷn+1)2 |Yn , Yn−1, ...]
8
1
= Var[Yn+1 |Yn, Yn−1 , ...].
8
3.2 Prediction of Gaussian log-returns 77

Hence
1
Efb∗ (Yn+1 ) − Efb∗approx (Yn+1 ) ≤ EVar[Yn+1 |Yn, Yn−1 , ...]
8
1 1
= (VarYn+1 − VarŶn+1 ) ≤ VarYn+1, (3.1.4)
8 8
and on the average the investor won’t lose more than 18 VarYn+1 . If he is prepared
to sacrifice this amount, then the greedy strategy (3.1.2) is possible. Still, this
does not obviate the necessity to estimate Ŷn+1 = E[Yn+1|Yn , Yn−1, ...], but as
we will see in the next section, for many practically relevant markets this can
be done with reasonable effort.

3.2 Prediction of Gaussian log-returns


Prediction problems as in the previous section are one of the core topics of
statistical analysis of time series. Using the distinctions of prediction problems
in Györfi, Morvai and Yakowitz (1998), the problem of estimating E[Yn+1|Fn ]
for some sub-σ-algebra Fn of σ(Yn, Yn−1, ...) is a problem of dynamic forecasting,
i.e., in each market period, the target to be estimated changes (“moving target”).
Typical choices for Fn are σ(Yn, Yn−1, ...), σ(Yn, ..., Y0) or σ(Yn, ..., Yn−dn+1 ) for
some sequence dn ∈ IN with dn → ∞, depending on what length of the process
past should be included. It should be noted that although we consider a bi-
infinite sequence of random variables {Yi }∞ i=−∞ (square-integrable and defined
on a common probability space (Ω, A, P)), we only observe realisations from
“time” i = 1 onwards.
The majority of existing algorithms for nonparametric dynamic forecasting (for
an introduction see Bosq (1996) or Györfi, Härdle et al. (1998)) rely on mix-
ing conditions which can rarely be verified from observational data. Inter-
est has turned to dynamic forecasting under weak conditions such as mere
stationarity and ergodicity, but avoiding mixing conditions. In this context,
a forecaster Ê(Yn, ..., Y0) for the conditional expectation E[Yn+1|Fn ] is called
strongly (weakly) universally consistent, if


lim Ê(Yn, ..., Y0) − E[Yn+1|Fn ] = 0

n→∞

with probability 1 (in the L1 -sense) for any stationary and ergodic process
{Yi }i. By stationarity, the computation of weakly consistent estimators (Fn :=
78 Chapter 3. Predicted stock returns and portfolio selection

σ(Yn, Yn−1, ...) for the moment) can be reduced to the so-called static forecasting
problem (Györfi, Morvai and Yakowitz, 1998), to find Ê such that


lim E Ê(Y0 , ..., Y−n) − E[Y1|Y0 , Y−1, ...] = 0.
n→∞

Based on conditional distribution estimates P̂ (dy|Y0 , ..., Y−n), Morvai, Yakowitz


and Algoet (1997) obtained weakly consistent estimators for the class of bounded
stationary ergodic processes, Algoet (1999) for the class of finite-mean station-
ary ergodic processes. Here again, to obtain convergence with probability 1
instead of L1-convergence, mixing conditions were needed. Concerning strong
universal consistency, we encounter various limitations, one of the most strik-
ing derived by Bailey (1976), with Ryabko (1988) sketching an easier intu-
itional proof formalised by Györfi, Morvai and Yakowitz (1998). Their re-
sult states that for any estimator Ê(Yn, ..., Y0) of the conditional expectation
E[Yn+1|Yn , ..., Y0], there is a stationary ergodic, binary-valued process {Yi}i such
that  
1 1
P lim sup Ê(Yn, ..., Y0) − E[Yn+1|Yn , ..., Y0] ≥ ≥ .
n→∞ 4 8
Algoet (1999) used refined techniques to show that there also exists a stationary
ergodic, binary-valued sequence {Yi}i with
 
1
E lim sup Ê(Yn, ..., Y0) − E[Yn+1|Yn , ..., Y0] ≥ .
n→∞ 2
This rules out the existence of strongly universally consistent forecasters for the
moving target E[Yn+1|Yn, ..., Y0].
This result is discouraging, but it does not rule out the existence of strongly
consistent forecasting rules for log-return processes as they arise in real financial
markets. In particular, Gaussian log-return processes have been proven to be a
good approximation for real log-return processes. Györfi, Morvai and Yakowitz
(1998) note that there has not yet been found an answer to the question whether
strongly consistent forecasters for E[Yn+1|Yn, ..., Y0] or even E[Yn+1|Yn, Yn−1 , ...]
exist in case the process {Yi}i is known to be Gaussian. The results of this
section show that for notably wide classes of Gaussian processes the answer is
affirmative. It should be noted, however, that strong universal consistency is
by far not the only means of “strong” forecasting. Other methods, so-called
“universal” predictors are based on Cesàro-convergence, i.e. the average of the
errors (e.g., squared prediction errors, or the squared differences of the estimates
3.2 Prediction of Gaussian log-returns 79

and E[Yn+1 |Yn, Yn−1 , ...]) converges with probability one to the minimal possible
value for any (bounded) stationary and ergodic process (Algoet, 1994). Such
estimators were obtained by Algoet (1992) and Morvai et al. (1996). Based on
Györfi, Lugosi and Morvai (1999), universal predictors for bounded or Gaussian
stationary ergodic processes have been constructed by Györfi and Lugosi (2001).
Throughout, we make the following two assumptions:

1. Let {Yn}∞n=−∞ be a real-valued, purely nondeterministic (i.e., there is no


deterministic term in the representation (3.2.2) below), stationary and
ergodic Gaussian process with

EYn = 0, VarYn = σ 2 > 0 (3.2.1)

and autocovariance function γ(k) := E(Yn+k Yn ). The assumption EYn =


0 is no restriction, it follows after differencing the original process of log-
returns. We denote the differenced process by {Yi} again.

2. From 1. and Wold’s decomposition theorem (Hida and Hitsuda, 1993,


Theorems 3.2 and 3.3; Shiryayev, 1984, VI §5 Theorem 2), a canonical
L2 -representation
X∞
Yn = ψj n−j (3.2.2)
j=0
P∞ 2
can be found with ψ0 = 1, j=0 |ψj | < ∞ and independent, identi-
cally N (0, σ2)-distributed innovations n . For z ∈ C,
I |z| < 1, the series
P∞ j
:= ∞ j
P
ψ
j=0 j z converges to the transfer function ψ(z) j=0 ψj z , which
never vanishes for |z| < 1 (Hida and Hitsuda, 1993, III §3 i).
We assume that

X
|ψj | < ∞, (3.2.3)
j=0

ensuring that the equality (3.2.2) holds with probability 1 (Brockwell and
Davis, 1991, Prop. 3.1.1).

Statistical and theoretical aspects of second order stationary processes are


treated extensively in literature, among many others in Brockwell and Davis
(1991), Caines (1988) and Hannan and Deistler (1988). Many results on Gaus-
sian processes can be found in Neveu (1968), Ibragimov and Rozanov (1978)
80 Chapter 3. Predicted stock returns and portfolio selection

and Hida and Hitsuda (1993). Indeed, (3.2.1) and (3.2.3) are standard assump-
tions in time series analysis, and a considerable variety of sufficient conditions
for the assumptions to hold are known. We just note that if f (λ), λ ∈ [−π, π],
is the spectral density of the process {Yn }∞
n=−∞ , then

2πf (λ) = σ2|ψ(e−iλ )|2 (3.2.4)

(Brockwell and Davis, 1991, eq. 4.4.3; Shiryayev, 1984, VI §6 eq. (16f.)), and

log f (λ)dλ > −∞ (3.2.5)
−π

is sufficient (and in fact also necessary) for the process to be purely nondeter-
ministic (Shiryayev, 1984, VI §5 Theorem 4). The simplest setting in which
(3.2.5) holds is the case when f happens to be bounded away from 0. Then
the process is strongly mixing (Ibragimov and Linnik, 1971, Theorem 17.3.3).
However, for the purpose of our analysis, we do not require this strong property.
We divide the problem of estimating E[Yn+1|Yn , Yn−1, ...] into two steps:

1. Approximation of E[Yn+1|Yn, Yn−1 , ...] by some conditional expectation


based on a fraction of the past only, say E[Yn+1|Yn, ..., Yn−d+1] with d ∈ IN.

2. Estimation of the latter quantity from the observed data.

Here and in all the following, d should be taken as d = dn ∈ IN, dn % ∞, dn ≤ n,


rather than as a mere constant. For the sake of simplicity of notation however,
this will be suppressed most of the time.

3.2.1 An approximation result


As to the first step, the approximation step, note that by stationarity and
Doob’s conditional expectation continuity theorem (Doob, 1984, 2.I.5)


E[Yn+1|Yn , Yn−1, ...] − E[Yn+1|Yn , ..., Yn−d +1 ] → 0
n

in L1 whenever dn → ∞. As will be seen, more stringent conditions are needed


to obtain convergence with probability 1. Similar problems arise in the context
3.2 Prediction of Gaussian log-returns 81

of on-line order selection for AR(∞) models. Here, founded on Rissanen’s


(1989) stochastic complexity for model comparison, the influence of increasing
dimensionality dn on the prediction error of the estimated “best” AR(dn ) model
is discussed in depth by Gerencsér (1992). The accuracy of order selection
schemes based on least squares principles is also investigated in Davisson (1965)
and Wax (1988). In contrast to these, the approximation of E[Yn+1|Yn, Yn−1 , ...]
by E[Yn+1|Yn , ..., Yn−dn+1 ] used in this section is not data-driven but chooses
deterministic dn according to the conditions given in the following lemma.

Lemma 3.2.1. If the Taylor coefficients of



1 X
= φk z k (|z| < 1)
ψ(z)
k=0

satisfy
∞  r
X
2 const.
|φk | ≤ (3.2.6)
log n
k=dn +1

for some r > 1 and sufficiently large n, then




lim E[Yn+1 |Yn, ..., Ydn ] − E[Yn+1|Yn , Yn−1, ...] = 0

n→∞

with probability 1.

The proof of this result and of the next theorems is deferred to section 3.3.

3.2.2 An estimation algorithm


If we collect the autocovariances in the matrix
 
γ(0) γ(1) . . . γ(d − 1)
 γ(1) γ(0) . . . γ(d − 2) 
Γd := (γ(i − j))i,j=1,...,d = 
 
.. .. .. 
 . .  . 
γ(d − 1) γ(d − 2) ... γ(0)

and
γd := (γ(d), ..., γ(1)) ,
82 Chapter 3. Predicted stock returns and portfolio selection

we can obtain an explicit formula for E[Yn+1|Yn, ..., Yn−d+1]. In fact, assumption
(3.2.3) implies
γ(k) → 0 (k → ∞) (3.2.7)
(Brockwell and Davis, 1991, Probl. 3.9), and from (3.2.1) and (3.2.7) it follows
that Γd is non-singular (Brockwell and Davis, 1991, Prop. 5.1.1). For Gaussian
processes one has (Brockwell and Davis, 1991, §5.4; Shiryayev, 1984, II §13
Theorem 2)
E[Yd+1 |Yd , ..., Y1] = γd Γ−1 T
d (Y1 , ..., Yd ) ,

and the autoregression function

md (yd , .., y1) := E[Yd+1|Yd = yd , ..., Y1 = y1 ] = γd Γ−1


d (y1 , ..., yd )
T
(3.2.8)

is linear. Stationarity yields

md (yd , ..., y1) = E[Yn+1|Yn = yd , ..., Yn−d+1 = y1 ]

and thus
E[Yn+1|Yn , ..., Yn−d+1] = md (Yn, ...Yn−d+1). (3.2.9)

From (3.2.8) and (3.2.9) it is plausible to construct a simple estimator Êd,n for
the conditional expectation E[Yn+1|Yn , ..., Yn−d+1] by the following steps:

1. Estimate the autocovariances γ(0), ..., γ(d) by the sample autocovariances


n−|k|
1 X
γ̂n(k) := YiYi+|k| (k = −d, ..., d). (3.2.10)
n i=1

2. Set Γ̃d,n := (γ̂n(i − j))i,j=1,...,d . Posing Γ̃d,n = n1 AAT with the matrix A ∈
IRd×2n formed by the first d rows of the matrix
 
0 0 — 0 Y1 Y2 . . . Yn
 0 — 0 Y Y ... Y 0 
1 2 n
 ∈ IRn×2n ,
 
 |     | 

0 Y1 Y2 . . . Yn 0 — 0

it is obvious that Γ̃d,n is non-negative definite. Thus


 
1 δi,j
Γ̂d,n := Γ̃d,n + Id = γ̂n(i − j) +
n n i,j=1,...,d
3.2 Prediction of Gaussian log-returns 83

is non-singular. Hence, with (3.2.10) and γ̂n,d := (γ̂n(d), ..., γ̂n(1)), define

m̂d,n (yd , ..., y1) := γ̂n,d Γ̂−1


d,n (TLn y1 , ..., TLn yd )
T

on the analogy of (3.2.8). Here, 0 < Ln % ∞ is a sequence of truncation


heights, TLy := sgn(y) min{L, |y|} being the truncation operator.

3. Plug in the last d observations,

Êd,n := m̂d,n (Yn, ..., Yn−d+1).

Remark. Even if γ̂n,d Γ̂−1 d,n constitutes a strongly consistent estimate for the
coefficients γd Γ−1
d of the autoregression function md , it may happen that esti-
mation errors (γ̂n,d Γ̂d,n − γd Γ−1
−1
d ) which are per se “acceptable”, occur together
with large values of the Yn, ..., Yn−d+1 “plugged in”. The resulting prediction
error |E[Yn+1 |Yn, ..., Yn−d+1] − Êd,n| becomes considerably large. Suitable trun-
cation limits the size of the Yi ’s without obscuring the information they contain.
Now, denoting the maximal absolute row sum of a real matrix A = (aij )i,j by
P
kAk∞ := maxi j |aij |, we establish the following convergence result for the
proposed estimator:

Theorem 3.2.2. Assume (3.2.1), (3.2.3), and choose dn and Ln such that for
some r ≥ 4, some δ > 0 and sufficiently large n

dn ≤ nr/(2(r−2)),
(log n)2/r (log log n)2(1+δ)/r
Ln kΓ−1 2 2(r+1)/r
dn k∞ dn = O(1),
n1/2
Ln kΓ−1 2
dn k∞ dn
→ 0 (n → ∞),
n
∞  
X dn 1 2
exp − 2 Ln < ∞. (3.2.11)
L
n=1 n

Then

lim E[Yn+1 |Yn, ..., Yn−dn+1 ] − Êdn,n = 0 (3.2.12)
n→∞

with probability 1.

From (3.2.11), for the choice of dn and Ln , one needs some bound on the possible
84 Chapter 3. Predicted stock returns and portfolio selection

growth of kΓ−1
dn k∞ . Based on the spectral density f and its (essential) minimum
mf ≥ 0 we distinguish the following cases:
Case 1: f is bounded away from 0, mf > 0.
Case 2: f has finitely many zeros λ1 , ..., λm ∈ (−π, π] of orders p1 , ..., pm, that
is, there exist constants p− +
j , pj > 0, K ≥ 1, δ > 0 such that

1 f (λ)
< + < K
K |λ − λj |pj

for all λ ∈ (λj , λj + δ) and

1 f (λ)
< − < K
K |λ − λj |pj

for all λ ∈ (λj − δ, λj ). In this case we define the order of the jth zero as
pj := max{p− + ∗
j , pj } and set p := max{pj |j = 1, ..., m}.

Case 3: No restrictions are imposed on mf , apart from those already implied


by (3.2.1) and (3.2.3).
For each of the cases, upper bounds for kΓ−1 dn k∞ can easily be derived from
classical results and results recently obtained by Serra (1998, 1999, 2000) and
Böttcher and Grudsky (1998). These yield

Corollary 3.2.3. In cases 1-3 and under the assumptions (3.2.1) and (3.2.3)
the strong consistency relation (3.2.12) holds if, for n sufficiently large,
Case 1: dn ≤ ns and Ln := (log n)t ( 61 > s > 0, t ≥ 1).
1
Case 2: dn ≤ ns and Ln := (log n)t ( 6+4p∗ > s > 0, t ≥ 1).
 s
Case 3: dn ≤ 1q log n and Ln := (log n)t (q > 4, 0 < s < 1, t ≥ 1).

Before proving the results, we give some examples to illustrate the application
of Lemma 3.2.1 and Corollary 3.2.3, such that for a suitable choice of dn and
Ln the consistency relation


lim Êdn ,n − E[Yn+1|Yn , Yn−1, ...] −→ 0
(3.2.13)
n→∞

holds with probability 1 for all processes in large classes Gi of Gaussian pro-
cesses.
3.2 Prediction of Gaussian log-returns 85

Example 3.1: First, let the class G1 consist of all Gaussian processes satisfying
(3.2.1) and (3.2.3) with spectral density bounded away from zero. We choose

1
dn := bns c ( > s > 0), Ln := log n
4
and obtain (3.2.13) for any element of G1 .
Indeed, for every element of G1 , ψ(z) has no zeros for |z| = 1 by (3.2.4). Then
ψ(z) never vanishes in the closed unit disk, and 1/ψ(z) is analytic on a disk
k
around 0 with radius 1 +  for some  > 0. Thus φk 1 + 2 → 0 as k → ∞,
−k
hence |φk | ≤ c 1 + 2 with some constant c > 0. Set ρ := (1 + /2)−2 < 1,
then
∞ ∞
X X ρ dn
|φk |2 ≤ c2 ρk = c2 ρ ≤ (log n)−3
1−ρ
k=dn +1 k=dn +1

for n sufficiently large if only dn/ log log n → ∞. Lemma 3.2.1 applies and
Corollary 3.2.3 yields (3.2.13).
For G1 and the choice dn = O(log n) it should be noted that (3.2.13) is also
a consequence of An, Chen and Hannan (1982, Theorem 6). From there it
follows that the Yule-Walker estimates (φ̂1 , ..., φ̂dn ) of the first dn coefficients in
the AR(∞) representation satisfy the uniform convergence property
 1/2 !
log log n
sup |φ̂j − φj | = O
1≤j≤dn n

with probability 1. Using the estimates (φ̂1 , ..., φ̂dn ) and the fact that the true
coefficients of the AR(∞) representation converge to zero at an exponential
rate, one obtains (3.2.13). However, the next example illustrates that Lemma
3.2.1 and Corollary 3.2.3 are applicable in more general situations as well.
Example 3.2: Consider the class G2 of all Gaussian processes satisfying (3.2.1)
and (3.2.3) such that for all elements of G2 the following two conditions hold:

a) The corresponding spectral density f has only finitely many zeros, each
of which is of finite order in the sense of the above case 2 and

b) |φk | ≤ Φk , where {Φk }k is an eventually decreasing sequence satisfying



Φ2−
P
k < ∞ for some  > 0.
k=1
86 Chapter 3. Predicted stock returns and portfolio selection

Note that the orders of the zeros as well as  may be different for each element
of G2 . This class comprises G1 as well as Gaussian processes with transfer
functions such as ψ(z) = (1 − z)1/3 := 1 + ∞ k 2·5·...·(3k−4)
P
k=1 ψk z , ψk = − 3·6·...·(3k) , for
σ2 2 λ 1/3
which φk = 1·4·...·(3k−2)

3·6·...·(3k) and f (λ) = 2π 4 sin 2 , the process being purely
nondeterministic by (3.2.5).
With the choice

dn := b(log n)log log nc (for n ≥ 2) and Ln := log n,

(3.2.13) holds for any process in G2 .


To see this, observe that

X ∞
X ∞
X
φ2k ≤ Φdn +1 Φ2−
k ≤ Φdn +1 Φ2−
k = const. · Φdn +1 .
k=dn +1 k=dn +1 k=1


Φ2k < ∞, Olivier’s theorem (Knopp, 1956, 3.3 Theorem 1) allows us
P
From
k=1
to infer kΦ2k −→ 0 (k → ∞), hence

X const.
φ2k ≤
(dn + 1)/2
k=dn +1

for n sufficiently large. Thus (3.2.6) is fulfilled and Lemma 3.2.1 applies. On
the other hand, limn→∞ (log n)log log n n−s = 0 for any s > 0, and we can use
Corollary 3.2.3 to obtain (3.2.13).

3.3 Proof of the approximation and estimation


results
We first turn to the
Proof of Lemma 3.2.1. Note that the transfer function ψ(z) never vanishes
for |z| < 1 (Hida and Hitsuda, 1993, III §3 i). Hence its reciprocal is analytic
for |z| < 1 and can be posed as

1 X
φ(z) = = φk z k (|z| < 1)
ψ(z)
k=0
3.3 Proof of the approximation and estimation results 87

with φ0 = 1. The innovations can be obtained by



X ∞
X
n+1 = φk Yn+1−k = Yn+1 + φk Yn+1−k
k=0 k=1

(Hida and Hitsuda, 1993, III §3 eq. 3.31). We observe that σ(n+1) is in-
dependent of σ(Yn, Yn−1 , ...), where the latter σ-algebra makes ∞
P
k=1 φk Yn+1−k
measurable. Thus,

X
0 = E[n+1|Yn , Yn−1, ...] = E[Yn+1|Yn , Yn−1, ...] + φk Yn+1−k
k=1

and

X
E[Yn+1|Yn , Yn−1, ...] = − φk Yn+1−k . (3.3.1)
k=1

Moreover,
 

E[Yn+1|Yn , ..., Yn−d+1] = E E[Yn+1|Yn, Yn−1 , ...] Yn , ..., Yn−d+1
d
X ∞
 X 

=− φk Yn+1−k − E φk Yn+1−k Yn, ..., Yn−d+1 .
(3.3.2)
k=1 k=d+1

(3.3.1) and (3.3.2) imply


2

E E[Yn+1|Yn , Yn−1, ...] − E[Yn+1|Yn , ..., Yn−d+1]
 ∞  ∞ 2
X X
= E E φk Yn+1−k Yn, ..., Yn−d+1 − φk Yn+1−k
k=d+1 k=d+1
∞ 2  X∞ 2 !
X
= E
φk Yn+1−k − E E
φk Yn+1−k Yn, ..., Yn−d+1

k=d+1 k=d+1
∞ 2
X
≤ E φk Yn+1−k .
k=d+1

Now, set

X
Hd (λ) := φk exp(−ikλ).
k=d+1
88 Chapter 3. Predicted stock returns and portfolio selection

Then |Hd (λ)|2 f (λ) is the spectral density of the linear filter ∞
P
k=d+1 φk Yn+1−k
(Brockwell and Davis, 1991, Theorem 4.10.1), and we obtain
∞ 2 Zπ
X
φk Yn+1−k = |Hd (λ)|2 f (λ)dλ

E

k=d+1 −π
Zπ ∞
X
≤ sup f (λ) |Hd (λ)|2 dλ = 2π sup f (λ) |φk |2 .
λ∈[−π,π] λ∈[−π,π] k=d+1
−π

Since the difference E[Yn+1|Yn, Yn−1 , ...]−E[Yn+1|Yn , ..., Yn−d+1] is the probability
limit of the Gaussian variables E[Yn+1 |Yn, ..., Yn−k+1] − E[Yn+1 |Yn, ..., Yn−d+1] as
k → ∞, it is itself Gaussian (Shiryayev, 1984, II §13.5).
We will apply the following lemma on the convergence of Gaussian random
variables that can be found in Buldygin and Dočenko (1977, Lemma 3).

Lemma 3.3.1. Let {Wn }∞ n=0 be a sequence of centered Gaussian random vari-
ables Wn ∼ N (0, σn2 ) with σn2 → 0 (n → ∞). If for every  > 0
∞  
X 
exp − 2 < ∞ (3.3.3)
n=1
σn

then Wn → 0 with probability 1 as n → ∞.

In particular, (3.3.3) is fulfilled if


 r
const.
σn2 ≤
log n

for some r > 1 and sufficiently large n.


This follows immediately from Lemma 3.3.1 if we choose 1 < r0 < r and observe
that
∞   X ∞   r 
X  log n
exp − 2 ≤ exp −
σn const.
n=N n=N
∞ ∞ ∞
  X 0 1
(exp (− log n))r =
X 0
X
≤ exp −(log n)r ≤ <∞
nr 0
n=N n=N n=N

for N sufficiently large.


3.3 Proof of the approximation and estimation results 89

Now, by Lemma 3.3.1 and with Mf := 2π supλ∈[−π,π] f (λ), the inequality


 
Var E[Yn+1|Yn , Yn−1, ...] − E[Yn+1|Yn , ..., Yn−d+1]
∞  r
X const.
≤ Mf |φk |2 ≤
log n
k=dn +1

for some r > 1 and n sufficiently large implies




E[Yn+1|Yn, Yn−1 , ...] − E[Yn+1|Yn, ..., Yn−d+1] → 0

with probability 1, and the proof of the lemma is finished. 2


Remark. In order to obtain convergence of the Wn to zero with probability 1,
one has to impose some condition on the rate of decay of σn2 . The conditions
of Lemma 3.3.1 cannot be substantially weakened. Indeed, if we consider any
sequence of independent random variables Wn ∼ N (0, σn2 ) with σn2 = (lb n)−1 ,
such a sequence cannot converge to zero with probability 1 (lb denoting the
logarithm for base 2): In fact, the lower bound on 1 − Φ in Feller (1968, VII.1.
Lemma 2) gives
   
1 4 16 1 1
P(|Wn | ≥ ) > √ − √ √ exp − lb n =: qn .
2 lb n lb n lb n 2π 8
P
The qn constitute an eventually positive, decreasing sequence. Thus, qn has
P n
the same convergence properties as 2 q2n . The latter series diverges because
of
 1/n    
n 1/n 4 16 1 1 1
(2 q2n ) = 2 √ − √ 1/2n
exp − → 2 exp − > 1.
n n n (2π) 8 8

Keeping this in mind, assume we had Wn → 0 with probability 1. With the


characteristic function f (y) := 1[−1/2,1/2]C (y), this yields independent random
variables f (Wn ) → f (0) = 0 with probability 1. Since f (Wn ) is {0, 1}-valued,
convergence to zero on a set of probability 1 implies (Shiryayev, 1984, II §10
Example 3)
∞   X ∞  
X 1
P f (Yn) = 1 = P |Wn | ≥ < ∞,
n=1 n=1
2
P
a contradiction to the divergence of qn .
90 Chapter 3. Predicted stock returns and portfolio selection

For the proof of Theorem 3.2.2 we need some preliminary observations. First
we recall some simple facts from the theory of matrix norms. Let k · k be some
vector norm on IRd . By k · k also denote the corresponding matrix norm

kAyk
kAk := sup = sup kAyk
y6=0 kyk kyk=1

for A ∈ IRd×d . The spectrum of A is the collection of the moduli of the eigen-
values of A,  

spr(A) := |λ| λ eigenvalue of A ,

and the spectral radius of A is

ρ(A) := max spr(A).

Recall the following inequalities (Isaacson and Keller, 1994, Corollaries in Sec.
1.1 and 1.3):

Lemma 3.3.2. Let A, B, C ∈ IRd×d .


1. If kAk < 1, then (I − A)−1 exists and we have
1
k(I − A)−1 k ≤ .
1 − kAk

2. If B and C are non-singular with kI − B −1 Ck < 1, the following inequalities


hold:
kB −1 k
kC −1 k ≤ ,
1 − kI − B −1 Ck
kB −1 k kI − B −1 Ck
kC −1 − B −1 k ≤ .
1 − kI − B −1 Ck

Proof. 1. Existence of (I − A)−1 follows from the well known Neumann series.
(I − A)−1 = I + A(I − A)−1 implies k(I − A)−1 k ≤ 1 + kAk k(I − A)−1 k, or
using kAk < 1
1
k(I − A)−1 k ≤ .
1 − kAk
2. Inversion of B −1 C = I + B −1 C − I yields C −1B = (I + B −1 C − I)−1 .
Multiplying by B −1 from the right, we obtain C −1 = (I − (I − B −1 C))−1 B −1
3.3 Proof of the approximation and estimation results 91

and using the first part of the lemma


kB −1 k
kC −1 k ≤ kB −1 k k(I − (I − B −1 C))−1 k ≤ .
1 − kI − B −1 Ck
Finally, pose C −1 − B −1 = (I − B −1 C)C −1 and obtain
kB −1 k kI − B −1 Ck
kC −1 − B −1 k ≤ kI − B −1 Ck kC −1k ≤ . 2
1 − kI − B −1 Ck

We will use the vector norms kyk2 := ( di=1 yi2 )1/2 or kyk∞ := maxi=1,...,d |xi|,
P

and the corresponding matrix norms kAk2 := ρ(AT A)1/2 and kAk∞ := maxi=1,...,d
Pd
j=1 |aij |, respectively. For the latter norm we have

Lemma 3.3.3. For any symmetric and non-singular matrix A ∈ IRd×d


d1/2
kAk∞ ≤ .
min spr(A−1 )

Proof. Let y = (y1 , ..., yd) ∈ IRd . Then kyk∞ ≤ kyk2 ≤ (d maxi yi2 )1/2 =
d1/2 kyk∞ and
kAyk∞ kAyk2
kAk∞ = sup ≤ d1/2 sup = d1/2kAk2 = (dρ(AT A))1/2
y6=0 kyk ∞ y6=0 kyk2
1
= (dρ(A2 ))1/2 = (dρ(A)2 )1/2 = d1/2 ρ(A) = d1/2 . 2
min spr(A−1 )

The proposed estimation procedure requires convergence of γ̂n(k) to γ(k). Doob


(1953, X §7 Theorem 7.1) gave the key result for this when proving that for
a real-valued, centered, stationary and ergodic Gaussian process γ̂n(k) → γ(k)
with probability 1 as n → ∞ is equivalent to either one of the conditions (a)
1
Pn 2
n+1 k=0 |γ(k)| → 0 as n → ∞ or (b) the spectral distribution function F is
continuous. In the present analysis, a more precise result featuring the rate of
convergence will be needed. This important result was given by An, Chen and
Hannan (1982, Theorem 1):

Lemma 3.3.4. Let {Yn}∞ n=1 be a stationary ergodic process with zero mean
and finite variance, allowing for a representation

X ∞
X
Yn = ψj n−j with ψ0 = 1, |ψj | < ∞
j=0 j=0
92 Chapter 3. Predicted stock returns and portfolio selection

and innovations n satisfying

E[n|m , m < n] = 0,
E[2n|m , m < n] = σ2,
E(|n|r ) < ∞ for some r ≥ 4.

Then for any δ > 0 and dn ≤ nr/(2(r−2))

lim rn max |γn(k) − γ̂n(k)| = 0


n→∞ 0≤k≤dn

with probability 1, where


n1/2
rn := .
(dn log n)2/r (log log n)2(1+δ)/r

In the “Gaussian case” considered here, the conditions on the innovations are
fulfilled, since the n ’s are independent, identically N (0, σ2)-distributed, hence
E[n |m , m < n] = En = 0 and E[2n|m , m < n] = E(2n ) = σ2.
After these introductory remarks, we turn to the core of the
n
Proof of Theorem 3.2.2. Ȳn−d+1 := is a shorthand notation for the d-past
of the process (Yn−d+1 , ..., Yn)T , and we set TLȲn−d+1n
:= (TLYn−d+1 , ..., TLYn)T .
The prediction error can be decomposed into

Êd,n − E[Yn+1|Yn , ..., Yn−d+1] = γ̂d,n Γ̂−1 TL Ȳn−d+1
n −1 n

d,n n
− γd Γd Ȳn−d+1

−1 −1 n −1 n n

≤ (γ̂d,n Γ̂d,n − γd Γd )TLn Ȳn−d+1 + γd Γd (TLn Ȳn−d+1 − Ȳn−d+1 )

≤ dLnkγd Γ−1 −1 −1 n n
d − γ̂d,n Γ̂d,n k∞ + dkγd k∞ kΓd k∞ kTLn Ȳn−d+1 − Ȳn−d+1 k∞ . (3.3.4)

Convergence of the first term in (3.3.4): Observe that

kI − Γ−1 −1 −1
d Γ̂d,n k∞ = kΓd (Γd − Γ̂d,n )k∞ ≤ kΓd k∞ kΓd − Γ̂d,n k∞
d
−1
X δik
= kΓd k∞ max γ(i − k) − γ̂d,n (i − k) − n

i=1,...,d
k=1
 d 
X 1
≤ kΓ−1
d k∞ max |γ(i − k) − γ̂d,n (i − k)| +
i=1,...,d n
k=1
 
−1 1
≤ kΓd k∞ d max |γ(i) − γ̂d,n(i)| + . (3.3.5)
i=0,...,d−1 n
3.3 Proof of the approximation and estimation results 93

From the An, Chen and Hannan (1982) result (Lemma 3.3.4), this tends to 0
with probability 1, if only (recall d = dn ≤ nr/(2(r−2)))
kΓ−1
dn k∞ dn
= O(1) (3.3.6)
rn
and
kΓ−1
dn k∞
→ 0 (n → ∞). (3.3.7)
n
Thus, for all ω ∈ Ω from a set of probability 1, for n sufficiently large
kI − Γ−1
d Γ̂d,n k∞ (ω) < 1

and by the second part of Lemma 3.3.2


kΓ−1
d k∞
kΓ̂−1
d,n k∞ ≤ ,
1 − kI − Γ−1
d Γ̂d,n k∞
kΓ−1 −1
d k∞ kI − Γd Γ̂d,n k∞
kΓ̂−1 −1
d,n − Γd k∞ ≤ .
1 − kI − Γ−1
d Γ̂d,n k∞
Using those inequalities, it follows that
kγd Γ−1 −1 −1 −1 −1 −1
d − γ̂d,n Γ̂d,n k∞ ≤ kγd Γd − γd Γ̂d,n k∞ + kγd Γ̂d,n − γ̂d,n Γ̂d,n k∞
≤ kγdk∞ kΓ̂−1 −1 −1
d,n − Γd k∞ + kγd − γ̂d,n k∞ kΓ̂d,n k∞
kI − Γ−1
d Γ̂d,n k∞ kγd − γ̂d,nk∞
≤ kγdk∞ kΓ−1
d k∞ + kΓ−1
d k∞ .
1 − kI − Γ−1
d Γ̂d,n k∞ 1 − kI − Γ−1
d Γ̂d,n k∞

For any second order stationary process, there exists a constant cγ with
0 ≤ |γ(k)| ≤ cγ < ∞ (k ∈ IN0 ),
hence kγd k∞ ≤ cγ (this can also be seen from (3.2.7)). We set Md,n :=
maxi=0,...,d |γ(i) − γ̂n (i)|. Now, appealing to (3.3.5) again and tidying things
up, we obtain
kΓ−1 −1 −1 2
d k∞ (cγ kΓd k∞ d + 1)Md,n + kΓd k∞ cγ /n
kγd Γ−1 −1
d − γ̂d,n Γ̂d,n k∞ ≤ −1 −1
1 − (kΓd k∞dMd,n + kΓd k∞/n)
(for all ω ∈ Ω from a set of probability 1 and for n ≥ N (ω)). Finally, according
to Lemma 3.3.4, dLnkγdΓ−1 −1
d − γ̂d,n Γ̂d,n k∞ → 0 with probability 1 if

Ln kΓ−1 2 2
dn k∞ dn
= O(1), (3.3.8)
rn
Ln kΓ−1 2
dn k∞ dn
→ 0. (3.3.9)
n
94 Chapter 3. Predicted stock returns and portfolio selection

Convergence of the second term in (3.3.4): dkγdk∞ kΓ−1 n n


d k∞ kTLn Ȳn−d+1 −Ȳn−d+1 k∞
−1 n n
is readily bounded from above by dcγ kΓd k∞ kTLn Ȳn−d+1 − Ȳn−d+1 k∞ , so it suf-
fices to ensure that (again d = dn )

dkΓ−1 n n
d k∞ kTLn Ȳn−d+1 − Ȳn−d+1 k∞ → 0

with probability 1 as n → ∞. To this end, for  > 0,

P dkΓ−1 n n

d k∞ kTLn Ȳn−d+1 − Ȳn−d+1 k∞ ≥ 
 
= P dkΓ−1 d k ∞ max |T Y
Ln n−i − Y n−i | ≥ 
i=0,...,d−1
 
≤ P (∃i ∈ {0, ..., d − 1} : |Yn−i| ≥ Ln ) = P max |Yn−i | ≥ Ln
i=0,...,d−1

since |TLn Yn−i − Yn−i | > 0 implies |Yn−i | ≥ Ln. But


    
Ln
P max |Yn−i| ≥ Ln ≤ dP (|Y1| ≥ Ln ) = 2d 1 − Φ .
i=0,...,d−1 σ
Here Φ denotes the distribution function of the standard normal distribution.
Application of the standard bound on Φ,
 
1 1 2
1 − Φ(y) < √ exp − y (y > 0),
2πy 2
(Feller, 1968, VII.1. Lemma 2) yields
 
σ 1 2
dkΓ−1 n n

P d k∞ kTLn Ȳn−d+1 − Ȳn−d+1 k∞ ≥  ≤ 2d √ exp − 2 Ln .
2πLn 2σ
From this
dkγdk∞ kΓ−1 n n
d k∞ kTLn Ȳn−d+1 − Ȳn−d+1 k∞ → 0

with probability 1, if only


∞  
X dn 1 2
exp − 2 Ln < ∞. (3.3.10)
L
n=1 n

Putting things together: Apart from dn ≤ nr/(2(r−2)), the conditions to be fulfilled


are (3.3.6) – (3.3.10). From (3.2.3) and (3.2.4), the spectral density f of the
process is continuous, which yields 0 < supλ∈[−π,π] f (λ) < ∞. With the standard
bound on the spectrum of Γd (Brockwell and Davis, 1991, Prop. 4.5.3), one has
1 1
kΓ−1 −1
d k∞ ≥ ρ(Γd ) = ≥ .
min spr(Γd ) 2π supλ∈[−π,π] f (λ)
3.3 Proof of the approximation and estimation results 95

Hence (3.3.6) and (3.3.7) are implied by (3.3.8) and (3.3.9), respectively. Rewrit-
ing (3.3.8) as
(log n)2/r (log log n)2(1+δ)/r
Ln kΓ−1 2 2(r+1)/r
dn k∞ dn = O(1),
n1/2
we end up with the following four conditions
dn ≤ nr/(2(r−2)),
Ln kΓ−1 2 2(r+1)/r
dn k∞ dn
(log n)2/r (log log n)2(1+δ)/r
· = O(1),
n1/2
Ln kΓ−1 2
dn k∞ dn
→ 0 (n → ∞),
n

X dn  
1
exp − 2 L2n < ∞
L
n=1 n

in order to obtain


lim E[Yn+1 |Yn, ..., Yn−dn+1 ] − Êdn,n = 0
n→∞

with probability 1. 2
Finally, we prove Corollary 3.2.3.
Proof of Corollary 3.2.3. Recently, Serra Capizzano obtained the following
result (1999, Theorem 3.2; 2000, Theorem 1.2) on the “worst” rate of decay for
the minimal eigenvalue µmin
d of Toeplitz matrices
Td (f ) := (γ(i − j))i,j=1,...,d
formed by the coefficients of the Fourier expansion

1 X
f (λ) = γ(j) exp(−ijλ) (λ ∈ [−π, π])
2π j=−∞

of some real-valued Lebesgue integrable function f . As to the situation of case


3, the following holds:

Lemma 3.3.5. Let f be a real-valued Lebesgue integrable function with es-


sential infimum mf . Suppose there exists an interval (a, b) ⊆ (−π, π), a < b,
and a number δ > 0 such that f (λ) > δ for almost all λ ∈ (a, b). Then
µmin
d ≥ K exp(−cd) + mf (3.3.11)
96 Chapter 3. Predicted stock returns and portfolio selection

with some c > 0 and some K > 0 independent of d.

The constants c and K are related to the measure of the set where f essentially
vanishes, not disclosed to the statistician. However, choosing some 0 < s < 1,
one has
K exp(−cd) + mf ≥ exp(−d1/s ) (3.3.12)
for sufficiently large d. As already noted, from (3.2.3) and (3.2.4), the spectral
density f of the process under consideration is continuous. Thus the require-
ments of Lemma 3.3.5 are met for Td (f ) = Γd , and Lemma 3.3.3 together with
(3.3.11) and (3.3.12) yields

d1/2
kΓ−1
d k∞ ≤ ≤ d1/2 exp(d1/s).
µmin
d

It remains to check the conditions (3.2.11) of Theorem 3.2.2 in the case of


q > 4, 0 < s < 1, t ≥ 1,
 s
1
dn ≤ log n and Ln = (log n)t .
q
The first inequality in (3.2.11) is obvious, the second and third follow from

kΓ−1 2
dn k∞ dn exp(2(dn)1/s )
≤ = dn n2/q−1/2
n1/2 n1/2
with 2/q − 1/2 < 0. As to the fourth inequality, observe that dn/Ln → 0 as
n → ∞ and for n sufficiently large −L2n /(2σ 2) ≤ −2 log n, hence
 
1 2 1
exp − 2 Ln ≤ 2 .
2σ n

The lemma for the more restrictive cases 1 and 2 follows similarly, using

µmin
d ≥ 2π inf f (λ) > 0
λ∈[−π,π]

(Brockwell and Davis, 1991, Proposition 4.5.3) for case 1 and


const.
µmin
d ≥
dp∗
(Böttcher and Grudsky, 1998, Example 3.1 and Theorem 3.4; Serra Capizzano,
2000, Remark 1.2; less general in Serra, 1998, Theorem 2.3) for case 2. 2
3.4 Simulations and examples 97

3.4 Simulations and examples


We first continue Examples 3.1 and 3.2 from Section 3.2.
Example 3.1 (continued): Consider the AR(1)-processes

Yt − ΦYt−1 = Zt Zt white noise with variance σ 2

with |Φ| < 1 (Brockwell and Davis, 1991, Example 4.4.2). The processes have
the spectral densities
σ2
f (λ) = (1 − 2Φ cos λ + Φ2 )−1 ,

bounded away from zero. The autocovariance function is given by
k
Φ2 + 1 2


γ(0) = 2 σ and γ(k) = γ(0),
(Φ − 1)2 1 + Φ2
from which we can calculate the conditional expectation E[Yn+1|Yn , ..., Y1] of the
next output given the past using (3.2.9). The latter conditional expectation acts
as an approximation of the unknown true autoregression E[Yn+1|Yn , Yn−1, ...].
Figure 3.1 a-b) shows two paths of the process (circles) for different values
of σ 2 and Φ together with the corresponding autoregression (grey) and the
estimated autoregression (black). The convergence of the estimates towards
the true autoregression is clearly visible.
Example 3.2 (continued): The process from Example 3.2 in Section 3.2 has
a spectral density
1/3
σ2

λ
f (λ) =  4 sin2 .
2π 2
From this we calculate the autocovariances

γ(k) = f (λ) cos(kλ)dλ
0

(Brockwell and Davis, 1991, eq. 4.3.10) using a compound trapezoidal integra-
tion rule with an error of at most 10−7 . Figure 3.2 a) shows the autocovariance
function of the process for a variance σ2 = 0.01 of the innovations. Again,
E[Yn+1|Yn , Yn−1, ...] is approximated by E[Yn+1 |Yn, ..., Y1], which is calculated by
(3.2.9). The Gaussian process {Yn}n itself is simulated by the method described
98 Chapter 3. Predicted stock returns and portfolio selection
3

−1

−2

−3
0 5 10 15 20 25 30 35 40 45 50

3.1 a) The AR(1) process from Example 3.1 with σ 2 = 1, Φ = −0.3.


2

1.5

0.5

−0.5

−1

−1.5

−2

−2.5

−3
0 5 10 15 20 25 30 35 40 45 50

3.1 b) The AR(1) process from Example 3.1 with σ 2 = 1, Φ = 0.3.

Figure 3.1: “True” (grey) and predicted (black) autoregression for two Gaussian
AR(1) models.
3.4 Simulations and examples 99

0.013

0.000

autocovariance

-0.001

-0.002
0 10 20 30 40 50
lag
3.2 a) The autocovariance function of the process.
0.4

0.3

0.2

0.1

−0.1

−0.2
0 5 10 15 20 25 30 35 40 45 50

3.2 b) “True” (grey) and predicted (black) autoregression.

Figure 3.2: The autocovariance function and a sample path of the process in
Example 3.2.
100 Chapter 3. Predicted stock returns and portfolio selection

in Brockwell and Davis (1991, Ex. 8.16, p. 271). Figure 3.2 b) shows the reali-
sations of Yn (circles), the “true” (grey) and the predicted (black) autoregression
for 50 consecutive days. Again, the convergence result is convincing.

The following two examples illustrate the performance of the greedy strategy
from Section 3.1.
Example 3.3: We run the greedy strategy from Section 3.1 in a market with
a stock whose price follows a geometrical Brownian motion (Luenberger, 1998,
Sec. 11.7; Korn and Korn, 1999, Ch. 2) with a mean return of 2% p.a. and
a volatility of σ = 10% p.a.. The bond offers a riskless return of 2% p.a. (i.e.
we set r = 2%/365). The algorithm of Section 3.2 is used to predict the log-
returns of the stocks. Figure 3.3 shows the value of an investment of $1 either
solely in the stock (grey, solid) or in the bond (grey, dashed). The value of the
greedy strategy is shown by the solid black line. In times when the share price
is likely to plummet the investor takes refuge in the bond. Thus he participates

1.06

1.04

1.02

0.98

0 20 40 60 80 100 120 140 160 180 200

Figure 3.3: Daily value of a $1 investment in a stock following a geometrical



Brownian motion with µ = 2%/365, σ = 10%/ 365 (grey, solid), in a bond
with short rate 2% p.a. (grey, dashed) and in a greedy strategy (black).
3.4 Simulations and examples 101

1.5

1.4

1.3

1.2

1.1

0.9

0.8

0.7

0.6

0.5
0 20 40 60 80 100 120 140 160 180 200

3.4 a) Yellow Corp. (YELL)

1.2

0.8

0 20 40 60 80 100 120 140 160 180 200

3.4 b) SONY (SNE).


102 Chapter 3. Predicted stock returns and portfolio selection

1.2

1.1

0.9

0.8

0.7

0.6
0 20 40 60 80 100 120 140 160 180 200

3.4 c) Boeing Co. (BA).

Figure 3.4: Daily value of a $1 investment in some shares from Dow Jones
Indices at NYSE 24/4/1998-8/2/1999 (grey, solid), in a bond with short rate
2% p.a. (grey, dashed) or in a greedy strategy (black), respectively.

in the rise of the share price more than in its fall, increasing his annual yield
beyond 2%. As in Section 2.2 this is the phenomenon of volatility pumping
(Luenberger, 1998, Examples 15.2 and 15.3). The share’s volatility is used to
draw an above average return from the stock.
Example 3.4: We replace the geometrical Brownian motion by various real
stock price processes on 200 days of trading, using the NYSE closing prices
24/4/1998-8/2/1999 (data from www.wallstreetcity.com). Figure 3.4 a-c) shows
the corresponding charts. Although the greedy strategy does not manage to
yield a return at least as large as the bond’s return in all cases (Fig. 3.4 b),
it typically outperformed the stock, considerably reducing the investor’s risk of
financial loss from pure share investment.
103

CHAPTER 4

A Markov model with transaction costs:


probabilistic view
In Chapter 1 (Theorem 1.3.4) we have seen that investment according to the
log-optimal portfolio is optimal in an asymptotic and a non-asymptotic sense.
In an m stock market with i.i.d. returns for example, the log-optimal portfolio
is a constant b∗ ∈ S. This, however, does not mean that once the investor
allocated his wealth according to b∗, he need not rebalance his portfolio in
the following market periods. On the contrary, since the price of each stock
evolves in a different way, the proportions of wealth held in the stocks will dif-
fer from b∗ already after the next market period. Thus, selling and purchasing
stocks becomes necessary after basically each market period. In a market where
transactions such as selling and buying generate transaction costs, one should
therefore adopt a more carefully chosen strategy, combining the task of maxi-
mizing the portfolio return with the task of having to pay as little transaction
fees as possible. In the setting of a Markov market model, this chapter gives an
optimal solution to this problem.
In Section 4.1 we will set up a market model with transaction costs and for-
mulate our investment goals. The chapter is devoted to probabilistic aspects
of the model, i.e. the analysis assumes the distribution of the return process
to be known. Statistical aspects, in particular what to do if the distribution
is not revealed to the investor, will be investigated in Chapter 5. In Section
4.2 we propose an optimal strategy (Theorem 4.2.1) whose optimality is proven
in the remainder of the section. Section 4.3 concludes the chapter with results
that will be needed in Chapter 5 when dealing with the statistical aspects of
the model.
Historically, the necessity of modelling investment problems including trans-
action fees arose at a time when most research dealt with continuous-time
104 Chapter 4. A Markov model with transaction costs: probabilistic view

models, and consequently, most of the work on transaction costs concerns the
continuous-time case. Not much emphasis was put on statistical discrete-time
strategy building. A detailed overview of literature can be found in Davis and
Norman (1990), Fleming (1999), Cadenillas (2000) or Bielecki and Pliska (2000).
However, many models and strategies involved considerable computational ef-
fort, which made it necessary to use approximation techniques (see e.g. Fitz-
patrick and Fleming, 1991, and Atkinson et al., 1997). Nowadays, continuous-
time models are frequently approximated by time-discrete models (for example
in Bielecki, Hernandez and Pliska, 1999, to discretize the continuous-time model
in Bielecki and Pliska, 1999). This brought about a renaissance of discrete-time
modelling. Up to date surveys of strategy planning under transaction costs
in discrete-time markets can be found in Carassus and Jouini (2000) for a
Cox-Rubinstein type model, in Blum and Kalai (1999) for optimal constant-
rebalanced portfolios and in Bobryk and Stettner (1999) for a market with a
bond and one stock having i.i.d. returns.

4.1 Strategies in markets with transaction fees


In this section we set up a market with a bond and several stocks whose returns
form a d-stage Markov process. Markets of this kind arise when discretizing
markets driven by stochastic differential equations even beyond the famous
Black-Scholes model. Consider a market of m stocks and a risk-free bond (which
we will think of as a bank account). Again, Xi,j denotes the return of the jth
stock from time i − 1 to time i and r ≥ 0 is the interest rate of the bond.
The return process {Xi := (Xi,1 , ..., Xi,m)T }∞
i=−d+1 is assumed to be a d-stage
Markov process with continuous autoregression, i.e., if we denote the last
d observed returns by X̄i+1 = (Xi−d+1 , ..., Xi), the process satisfies the following
conditions:

V1: {Xi }∞ m
i=−d+1 is a stationary [A, B] -valued stochastic process on a proba-
bility space (Ω, A, P) (0 < A < B < ∞ need not be known),

V2: E[h(b, X̄i+1 )|Fi ] = E[h(b, X̄i+1)|X̄i ] P − a.s.,

V3: E[h(b, X̄i+1 )|X̄i = x̄] is a continuous function of (b, x̄) ∈ S × [A, B]dm ,
4.1 Strategies in markets with transaction fees 105

for all continuous functions h : S × [A, B]dm −→ IR and all i. Thus, at time
i − 1, the further evolution of the market depends upon a sub-σ-algebra σ(X̄i)
of the total information field Fi := σ(X−d+1 , ..., Xi−1).
Following Blum and Kalai (1999) we assume that only purchasing shares in the
market generates transaction costs (brokerage fees, commission) proportional
to the total value of the transaction, i.e.

transaction costs = c · value of purchased shares

with a commission factor c ∈ [0, 1). Paying into and drawing money from
the risk-free account does not generate any fees. In case two commission fac-
tors apply for selling and purchasing shares, say csell and cpurchase, one may use
c := (csell + cpurchase)/(1 + cpurchase) as a compound commission factor applying to
purchases only. This approach never underestimates the capital reducing effect
of transaction costs. Indeed, with 1 unit of money in a stock one can purchase
(1 − csell )/(1 + cpurchase) = 1 − c value in another stock or pay 1 − csell ≥ 1 − c
units of money into the account. Conversely, 1 unit of money in the bond can
purchase 1/(1 + cpurchase) ≥ 1 − c value in a stock.
To see how investment actions are limited by transaction fees, consider a fixed
time instant i. Then the investor’s wealth Wi is used to acquire a new portfolio,
given by an enhanced portfolio vector bi+1 := (bi+1,−1 , ..., bi+1,m)T , which now is
(m + 2)-dimensional. Here

bi+1,−1 is the proportion of Wi needed to settle the transaction costs that arise
when the portfolio is restructured,

bi+1,0 is the proportion of Wi to be held in the bond and, as usual,

bi+1,j is the proportion of Wi to be held in stock j (j = 1, ..., m).

No short selling or consumption is considered, i.e. m


P
j=−1 bi+1,j = 1 and bi+1,j ≥
0. Thus, the portfolio vector bi+1 chosen at time i becomes a member of the
simplex S := {b = (b−1 , ..., bm)T ∈ IRm+2 |bj ≥ 0 for all j, m
P
j=−1 bj = 1}.

Now, in the market period from time i − 1 to time i the investor’s wealth Wi−1
generated a value of (1 + r)bi,0 Wi−1 in the bond and of Xi,j bi,j Wi−1 in the jth
stock. An amount of bi,−1 Wi−1 was used to settle transaction fees and is no
106 Chapter 4. A Markov model with transaction costs: probabilistic view

longer available. The resulting wealth at time i becomes Wi = (1 + r)bi,0 Wi−1 +


Pm
j=1 Xi,j bi,j Wi−1 , or equivalently

m
Wi X
= (1 + r)bi,0 + Xi,j bi,j . (4.1.1)
Wi−1 j=1

Rebalancing the portfolio bi to bi+1 generates transaction costs of total amount


c m +
P
j=1 (bi+1,j Wi − Xi,j bi,j Wi−1 ) which are settled using the amount bi+1,−1 Wi .
Hence the investor has to observe the self-financing condition
m
X
bi+1,−1 Wi = c (bi+1,j Wi − Xi,j bi,j Wi−1 )+ . (4.1.2)
j=1

Using (4.1.1), (4.1.2) is equivalent to


m
!
X
gc (bi , X̄i+1 , bi+1) := bi+1,−1 (1 + r)bi,0 + Xi,k bi,k
k=1
m m
! !+
X X
−c bi+1,j (1 + r)bi,0 + Xi,k bi,k − Xi,j bi,j = 0. (4.1.3)
j=1 k=1

If x̄ is the matrix formed from the last d observed return vectors and s denotes
the last portfolio vector, we call the collection of all portfolios satisfying the
self-financing condition,

S(s, x̄) = {b ∈ S | gc (s, x̄, b) = 0}, (4.1.4)

the admissible set corresponding to (s, x̄) ∈ S × [A, B]dm . Note that for all
(s, x̄) ∈ S × [A, B]dm
a∗ := (0, 1, 0, ..., 0)T ∈ S(s, x̄), (4.1.4a)
i.e. there is always one option open to the investor: He can pay all his wealth
into the risk-free account at any time.
The investor can only follow non-anticipating portfolio strategies which comply
with the self-financing condition:

Definition 4.1.1. A sequence {bi }∞i=0 of random variables Ω → S is called


admissible portfolio strategy if for all i ∈ IN the following holds:

1. bi ∈ S(bi−1 , X̄i ) P-a.s.,


4.1 Strategies in markets with transaction fees 107

2. bi is Fi -measurable and

3. b0 = a∗ .

Condition 1 enforces that an admissible strategy never generates more transac-


tion fees than affordable. Because of condition 2, investment decisions require
no more information than currently available. Finally, condition 3 provides for
a standardized setting in so far as the investor’s wealth is accumulated in the
bond at the beginning of the investment process.
As we have seen, the pair (bi−1, X̄i ) carries complete information about both, the
stochastic regime of the next market period (from the d-stage Markov property)
and about the admissible set (from the self-financing condition (4.1.3)). Hence,
at time i − 1, the investment decision may be taken by applying a so-called
portfolio selection function φ : S × [A, B]dm −→ S to the pair (bi−1 , X̄i ). This
approach wastes no information.

Definition 4.1.2. An admissible portfolio strategy {bi}∞


i=0 is based on the
dm
portfolio selection function φ : S × [A, B] −→ S if φ is measurable and,
for all i,
bi = φ(bi−1 , X̄i ) P − a.s..

We now move on to defining the investment goal. From the previous chapters
we know that the logarithmic utility function f : S × ([A, B]m )d −→ IR,

f (b, X̄i+1 ) = log (0, 1 + r, XiT )b




is the optimal choice for long-run and short-term investment targets alike (note
that the entry 0 in the first vector of the scalar product corresponds to the
amount of transaction costs which is lost). In this chapter we therefore assume
that the investor aims to choose an admissible strategy {bi}i such that, in the
long run, the expected mean utility E( n1 n−1
P
i=0 f (bi , X̄i+1 )) is larger than for any
other strategy based on some portfolio selection function. This is formalized by
inequality (4.2.4) below.
It should be pointed out that the process {Xi }i need not contain mere return
information, but may contain additional factors and side information in
some of its coordinates as well. – Provided of course that these occur in the
108 Chapter 4. A Markov model with transaction costs: probabilistic view

form of a d-stage Markov process with continuous autoregression function and


that the joint vector of returns and factors satisfies V1, V2 and V3. The utility
function f then simply ignores the coordinates containing the factors and side
information.

4.2 An optimal strategy


For the rest of this chapter, assume that V1, V2 and V3 hold and that a (0, 1)-
valued sequence of discount factors {δi }∞
i=1 is fixed for which

δi & 0 monotonic decreasing as i → ∞,


iδi −→ ∞ as i → ∞,
δi piecewise constant: δ1 = δ2 , δ3 = δ4 = δ5 , δ6 = ... = δ9 , ...,
2
(δi − δi+1 )/δi+1 ≤ 1 for all i ≥ 1. (4.2.1)
Then the following theorem gives an algorithm for optimal investment based on
the so-called Bellman equation. Using stationarity, let
m(b, x̄) := E[f (b, X̄i+1 )|X̄i = x̄] = E[f (b, X̄d+1)|X̄d = x̄].

Theorem 4.2.1. Let hi ∈ C(S × [A, B]dm ) be a solution of the Bellman equa-
tion

hi (s, x̄) = max m(b, x̄) + (1 − δi )E[hi(b, X̄d+1 )|X̄d = x̄] . (4.2.2)
b∈S(s,x̄)

With Vi (b, x̄) := m(b, x̄) + (1 − δi )E[hi(b, X̄d+1 )|X̄d = x̄] we obtain an admissible
portfolio strategy by
b∗0 := a∗
b∗i := arg max Vi (b, X̄i). (4.2.3)
b∈S(b∗i−1 ,X̄i )

This strategy is optimal in the sense that for any portfolio strategy {bi}i based
on a portfolio selection function
n−1 n−1
!
1X 1 X
lim inf Ef (b∗i , X̄i+1 ) − Ef (bi , X̄i+1 ) ≥ 0. (4.2.4)
n→∞ n i=0 n i=0

Before proving the theorem we make some remarks:


4.2 An optimal strategy 109

1. (4.2.4) compares the mean utility of the investment schemes {b∗i }i and
{bi }i in the worst case (lim inf) that may occur for remote time horizon.
Hence, the result of Theorem 4.2.1 is a worst case analysis. The Bellman
equation (4.2.2) penalizes the “target” function m to obtain a target that
is adjusted to loss through transaction costs. It thus balances the need to
make as many transactions as needed, but, at the same time, as few as
possible.

2. The strategy b∗i is a generalization of the log-optimal strategy in Chapter 1.


If no transaction costs occur, then c = 0 and S(s, x̄) = {(0, b0, ..., bm)|bj ≥
0, j bj = 1} independent of s. Hence, hi (s, x̄) = hi (x̄), and b∗i =
P

arg maxb∈S(b∗i−1 ,X̄i ) m(b, X̄i) coincides with the classical log-optimal strat-
egy.

3. In dynamic programming, a solution hi of the Bellman equation (4.2.2)


is frequently referred to as value function. The existence of a value
function for δi = 0 can be obtained under more restrictive extra conditions
such as a finite state space of the Markov chain and certain recurrence
properties of the transition matrix (Ross, 1970, Sec. 6.7 and Bertsekas,
1976, Sec. 8.1). To avoid these extra coditions we use a variant of the
so-called vanishing discount approach (Hernández-Lerma and Lasserre,
1996, Sec. 5.3 - 5.5) where solutions of the Bellman equation (4.2.2) are
produced for a sequence of discount factors δi → 0.

4. A sequence {δi }i satisfying the above conditions can be obtained recur-


sively by
1 p 
d1 ∈ (0, 1) arbitrary, dk+1 := 1 + 4dk − 1
2
and

δ1 = δ2 := d1
δ3 = δ4 = δ5 := d2
δ6 = δ7 = δ8 = δ9 := d3
δ10 = ....

Note that dk coverges monotonically decreasing to 0, that k(k +1)dk → ∞


(k → ∞) and that (dk − dk+1)/d2k+1 = 1.
110 Chapter 4. A Markov model with transaction costs: probabilistic view

The Bellman equation is closely linked with the theory of Markov control pro-
cesses and stochastic dynamic programming (SDP). These have been applied to
financial mathematics since the 1960s, e.g. by Samuelson (1969), Merton (1969)
and Bertsekas (1976, Sec. 3.3). A good introduction to SDP and Markov control
can be found in Bertsekas (1976), Bertsekas and Shreve (1978), in Hernández-
Lerma and Lasserre (1996 and 1999) or in Bather (2000). Among recent appli-
cations to discrete-time finance we only mention Hernández-Lerma and Lasserre
(1996, Example 1.3.2) and Duffie (1988, Sec. III.19) – both contain more refer-
ences. It should be pointed out, however, that none of the classical models can
properly deal with the transaction cost problem we are dealing with. Before
giving details of the proof of Theorem 4.2.1, we should briefly comment on that.

4.2.1 Some comments on Markov control


In the terminology of Hernández-Lerma and Lasserre (1996 and 1999), a discrete-
time Markov-control process is a five-tuple (X, A, {A(x)|x ∈ X}, Q, c), with
state space X and a set A of control actions. {A(x)|x ∈ X} is a class of
nonempty sets A(x) ⊆ A, where A(x) contains the admissible control actions
in state x. xt denotes the state of the system at time t. Q is the transition
probability distribution Q(dy|x, a) = Pxt+1 |xt =x,at =a (dy), i.e. the distribution of
xt+1 given the system is in state xt = x at time t and control action at = a is
taken. Finally, c is the one step cost function, c(x, a) being the cost incurred
when choosing control action a in state x.
One then seeks for a sequential choice of control actions ai ∈ A(xi ) such as to
minimize
n−1
1X
lim sup E c(xi, ai )
n→∞ n i=0
or maximize
n−1
1X
lim inf E (−c)(xi, ai )
n→∞ n i=0

(−c thus becomes the utility function). Optimal strategies can be generated
from solutions (ρ∗ , h) of the Bellman equation
 Z 

ρ + h(x) = min c(x, a) + h(y)Q(dy|x, a)
a∈A(x) X
4.2 An optimal strategy 111

with ρ∗ ∈ IR and continuous function h. In order to solve this equation, ei-


ther appropriate boundedness and continuity properties of the solutions of the
discounted Bellman equation
 Z 

ρ + h(x) = min c(x, a) + δ h(y)Q(dy|x, a)
a∈A(x) X

(0 < δ < 1) are needed (Hernández-Lerma and Lasserre, 1996, Theorems 5.4.3
and 5.5.4) or recurrence and irreducibility conditions for the Markov chain with
the transition probability distribution Q (Ross, 1970, Cor. 6.20 or Bertsekas,
1976, Prop. 3). Typically, most research assumes the state space X to be count-
able and the additional existence of a state x∗ with Q(x∗|x, a) ≥ const. > 0 for
all x ∈ X, a ∈ A(x). Such conditions are hard to verify from market data and
–what is even worse– they are not satisfied for the control problem of portfolio
selection under transaction costs. Indeed, we have seen that the collection of
admissible actions (portfolio choices bi) at time i − 1 depends on the last d
observed return vectors as well as on the last chosen control bi−1 . Therefore,
we have no other choice than to describe the state of the system by the joint
vector (bi−1 , X̄i ). Then the transition dynamics under control bi is given by
(bi−1 , X̄i ) 7−→ (bi , X̄i+1 ). We thus end up with a transition probability distribu-
tion Q that clearly does not satisfy the mentioned recurrence and irreducibility
conditions. This drawback was also noted by Bielecki, Hernández and Pliska
(2000). They observe that the typical conditions imposed to ensure the ex-
istence of a solution of the Bellman equation are to rigid to be applicable in
transaction cost problems. On the other hand they found it still possible to char-
acterize optimal strategies in terms of optimality equations which correspond
to the classical Bellman equation. This is supported by the earlier findings of
Stettner (1999) and our results.

4.2.2 Proof of Theorem 4.2.1


In this section we prove the main result, Theorem 4.2.1. The proof requires
several steps. We first have to show the existence of a solution of the Bellman
equation and investigate certain properties of the solution. We then have to
show that the strategy b∗i calculated by (4.2.3) consists of portfolio choices that
are admissible. In a third step we will derive a technical tool to approximate
admissible strategies based on a portfolio selection function by simpler periodic
112 Chapter 4. A Markov model with transaction costs: probabilistic view

strategies (Lemma 4.2.6). It is only in the fourth step that, using the technical
tools derived before, we will completely prove Theorem 4.2.1.
1st step in the proof of Theorem 4.2.1: Solving the Bellman equation.
In dynamic programming, the Bellman equation usually takes the form

λ + h(s, x̄) = max m(b, x̄) + (1 − δ)E[h(b, X̄d+1)|X̄d = x̄] , (4.2.5)
b∈S(s,x̄)

which is to be solved for h ∈ C(S × [A, B]dm ) and λ ∈ IR (see e.g. Bertsekas,
1976, Sec. 8.1 or Hernández-Lerma and Lasserre, 1996, Sec. 5.2). We first de-
rive some basic facts about the existence of solutions and their properties. The
following proposition is well known from the theory of dynamic programming,
we nonetheless outline a proof.

Proposition 4.2.2. For all δ ∈ (0, 1) there exists a solution (h, λ) ∈ C(S ×
[A, B]dm ) × IR and a solution (h, 0) ∈ C(S × [A, B]dm ) × {0} of the Bellman
equation (4.2.5).

Proof. In the following let g ∈ C(S × [A, B]dm ), λ ∈ IR. The seminorm

kgk := max g(s, x̄) − min g(s, x̄)


(s,x̄)∈S×[A,B]dm (s,x̄)∈S×[A,B]dm

on C(S × [A, B]dm ) makes the factor space

C ∗ := C(S × [A, B]dm ){constant functions}

a Banach space with norm k[g]k := kgk, where [g] := {g(·, ·) + r|r ∈ IR} denotes
the equivalence class of g ∈ C(S × [A, B]dm ).
Indeed, K := {constant functions} is a closed subspace of the space (C(S ×
[A, B]dm , k · k∞). Then (C(S × [A, B]dm )K, k · k∗ ) with

k · k∗ : C(S × [A, B]dm )K −→ IR+


0 :
[f ] 7−→ k[f ]k∗ := inf kf + kk∞ = inf kf + ck∞
k∈K c∈IR

is a Banach space (Hirzebruch and Scharlau, 1996, Lemma 5.10). The norms
k·k∗ and k·k are equivalent on C(S ×[A, B]dm )K because of inf c∈IR kf +ck∞ =
1
2 kf k.

Note that, V3 implies that for any g ∈ C(S × [A, B]dm ), the conditional expec-
tation E[g(b, X̄d+1 )|X̄d = x̄] is continuous in (b, x̄) ∈ S × [A, B]dm , and S(·, ·) is
4.2 An optimal strategy 113

continuous in the sense of Aliprantis and Border (1999, Definition 16.2 and The-
orems 16.20, 16.21). Now, by Berge’s Maximum Theorem (Aliprantis and Bor-
der, 1999, Theorem 16.31), maxb∈S(s,x̄){m(b, x̄) + (1 − δ)E[g(b, X̄d+1 )|X̄d = x̄]}
is continuous on S × [A, B]dm . Hence we can define the operator

M: C(S × [A, B]dm ) −→ C(S × [A, B]dm ) :



g(s, x̄) 7−→ max m(b, x̄) + (1 − δ)E[g(b, X̄d+1 )|X̄d = x̄] .
b∈S(s,x̄)

On C ∗ , M operates according to M [g] := [M g], observe that the right hand side
is independent of the chosen representative of [g]. Solving (4.2.5) thus becomes
equivalent to solving the functional equation

M [h] = [h] (4.2.6)

in C ∗ . This can be accomplished by an application of Banach’s Fixed Point The-


orem. According to the Banach Fixed Point Theorem (Aliprantis und Border,
1999, Theorem 3.36), (4.2.6) can be solved using the iteration [h]n+1 := M [h]n
(value iteration), if only M is a contraction mapping. This will be shown in the
following, using standard techniques.
For functions g, h ∈ C(S × [A, B]dm ) we have

(M h)(s, x̄) = max m(b, x̄) + (1 − δ)E[h(b, X̄d+1 )|X̄d = x̄]
b∈S(s,x̄)
= m(b , x̄) + (1 − δ)E[h(b∗, X̄d+1 )|X̄d = x̄],


(M g)(s, x̄) = max m(b, x̄) + (1 − δ)E[g(b, X̄d+1 )|X̄d = x̄]
b∈S(s,x̄)
≥ m(b∗, x̄) + (1 − δ)E[g(b∗, X̄d+1 )|X̄d = x̄]

for some b∗ ∈ S(s, x̄). Hence, writing max(s,x̄) instead of max(s,x̄)∈S×[A,B]dm ,

(M h)(s, x̄) − (M g)(s, x̄) ≤ (1 − δ)E[h(b∗ , X̄d+1 ) − g(b∗ , X̄d+1 )|X̄d = x̄]
≤ (1 − δ) max (h(s, x̄) − g(s, x̄))
(s,x̄)

for all (s, x̄) ∈ S × [A, B]dm , which yields

max ((M h)(s, x̄) − (M g)(s, x̄)) ≤ (1 − δ) max (h(s, x̄) − g(s, x̄)) .
(s,x̄) (s,x̄)

From this we find that

kM h − M gk = max ((M h)(s, x̄) − (M g)(s, x̄))


(s,x̄)
114 Chapter 4. A Markov model with transaction costs: probabilistic view

− min ((M h)(s, x̄) − (M g)(s, x̄))


(s,x̄)
= max ((M h)(s, x̄) − (M g)(s, x̄))
(s,x̄)
+ max ((M g)(s, x̄) − (M h)(s, x̄))
(s,x̄)
≤ (1 − δ) max (h(s, x̄) − g(s, x̄))
(s,x̄)
+(1 − δ) max (g(s, x̄) − h(s, x̄))
(s,x̄)
= (1 − δ)kg − hk

for 0 < 1 − δ < 1, a contraction property that implies the existence of a solution
of (4.2.5).
Finally, let (h, λ) ∈ C(S × [A, B]dm ) × IR be an arbitrary solution of (4.2.5),

λ + h(s, x̄) = max m(b, x̄) + (1 − δ)E[h(b, X̄d+1 )|X̄d = x̄] .
b∈S(s,x̄)

This is equivalent to (c ∈ IR arbitrary)



(λ − δc) + (h + c)(s, x̄) = max m(b, x̄) + (1 − δ)E[(h + c)(b, X̄d+1 )|X̄d = x̄] .
b∈S(s,x̄)

In particular, choosing c := λ/δ, for h̃ := h + c, we obtain the relation


n o
h̃(s, x̄) = (λ − δc) + h̃(s, x̄) = max m(b, x̄) + (1 − δ)E[h̃(b, X̄d+1 )|X̄d = x̄] ,
b∈S(s,x̄)

and (h̃, 0) also solves the Bellman equation. 2


The next lemma gives some technical properties of arbitrary solutions (h, 0) of
the Bellman equation.

Lemma 4.2.3. Let {δi }i be a monotonic decreasing sequence with δi ∈ (0, 1)


2
and (δi − δi+1 )/δi+1 ≤ 1. If (hi , 0) is a solution of the Bellman equation (4.2.5,
δ = δi ), then

1. δi khi k∞ ≤ kf k∞ .

2. khi+1 − hi k∞ ≤ kf k∞ .

3. Along any admissible portfolio sequence {bi}i we have

Ehi (bi−1, X̄i ) − Ehi (bi , X̄i+1 ) ≥ −2kf k∞ . (4.2.7)


4.2 An optimal strategy 115

In particular, for all j,

Ehi (bi−1 , X̄i ) − Ehi (a∗ , X̄j )


= Ehi (bi−1 , X̄i) − Ehi(a∗ , X̄i+1 ) ≥ −2kf k∞ . (4.2.8)

Proof. 1. The Bellman equation (4.2.5) with λ = 0 implies that kf k∞ + (1 −


δi )khik∞ ≥ khi k∞ , which can be rewritten as δi khik∞ ≤ kf k∞ .
2. Similarly as in the proof of Proposition 4.2.2,

hi (s, x̄) = max m(b, x̄) + (1 − δi )E[hi(b, X̄d+1 )|X̄d = x̄]
b∈S(s,x̄)
= m(b∗, x̄) + (1 − δi )E[hi (b∗ , X̄d+1 )|X̄d = x̄],

hj (s, x̄) = max m(b, x̄) + (1 − δj )E[hj (b, X̄d+1 )|X̄d = x̄]
b∈S(s,x̄)
≥ m(b , x̄) + (1 − δj )E[hj (b∗ , X̄d+1 )|X̄d = x̄]

for some b∗ ∈ S(s, x̄). Taking differences yields

hi (s, x̄) − hj (s, x̄)


≤ (1 − δi )E[hi(b∗ , X̄d+1 )|X̄d = x̄] − (1 − δj )E[hj (b∗ , X̄d+1 )|X̄d = x̄]
≤ max {(1 − δi )hi (s, x̄) − (1 − δj )hj (s, x̄)}
(s,x̄)∈S×[A,B]dm
≤ (1 − δi )khi − hj k∞ + |δj − δi |khj k∞
≤ max{1 − δi , 1 − δj }khi − hj k∞ + |δj − δi | max{khik∞ , khj k∞ }.

The right hand side of this chain of inequalities remains the same when swapping
i and j, therefore

khi − hj k∞ ≤ max{1 − δi , 1 − δj }khi − hj k∞ + |δj − δi | max{khik∞ , khj k∞ }.

Using part 1 of the lemma, we conclude that

|δj − δi | · max{khi k∞ , khj k∞ } |δj − δi |


khi − hj k∞ ≤ ≤ kf k∞ .
1 − max{1 − δi , 1 − δj } min{δi , δj }2
2
Monotonicity of δi and the assumption (δi − δi+1 )/δi+1 ≤ 1 allows us to infer

δi − δi+1
khi − hi+1 k∞ ≤ 2 kf k∞ ≤ kf k∞ .
δi+1
116 Chapter 4. A Markov model with transaction costs: probabilistic view

3. The relation (4.2.8) is a direct consequence of (4.2.7) because of the station-


arity of {Xi}i and a∗ ∈ S(bi−1 , X̄i) being deterministic. To prove (4.2.7) observe
that the Bellman equation implies

hi (bi−1, X̄i ) ≥ m(bi, X̄i) + (1 − δi )E[hi(bi , X̄i+1 )|X̄i ],

and

Ehi (bi−1 , X̄i )−Ehi(bi , X̄i+1 ) ≥ Em(bi, X̄i )−δi Ehi (bi, X̄i+1 ) ≥ −kf k∞ −δi khi k∞ .

Plugging in the result from part 1 of the lemma yields

Ehi (bi−1 , X̄i ) − Ehi (bi, X̄i+1 ) ≥ −2kf k∞ ,

and the proof is finished. 2

2nd step in the proof of Theorem 4.2.1: Admissibility of {b∗i }i .


We will show that the maximization problem (4.2.3) is solved by a measurable
solution procedure b∗i = φi (b∗i−1 , X̄i) with suitable portfolio selection functions
φi . Thus {b∗i }i becomes an admissible strategy. The argument involves some
notions from set-valued analysis.
The admissible set S(s, x̄) for (s, x̄) ∈ S × [A, B]dm is a member of the family
C = C(S) of closed subsets of S. C is equipped with the σ-algebra generated
by families of the form FK := {A ∈ C(S)|A ∩ K 6= ∅} (K ranging over all
compact K ⊆ S) (Matheron, 1975, p. 27, or Molchanov, 1993, Chapter 1).
At time i − 1, an admissible strategy {bi }i picks some element of the random
set Si := S(bi−1 , X̄i ) = {b ∈ S|gc (bi−1 , X̄i , b) = 0} ∈ C. Si is a so-called
random closed set (RACS), a measurable mapping C : Ω → C. bi itself is a
selector, a random variable Ω → S such that bi ∈ Si with probability 1. A
short introduction to RACS and selectors can be found in Hernández-Lerma
and Lasserre (1996, Appendix D) and Bertsekas and Shreve (1978, Sec. 7.5).
Si is compact. Thus the solution of the maximization problem (4.2.3) is the
random non-empty set
( )

arg max Vi (b, X̄i) := b ∈ S(bi−1 , X̄i) Vi (b, X̄i) = sup Vi (c, X̄i ) .

b∈S(bi−1 ,X̄i ) c∈S(bi−1 ,X̄i )

We need only show that this is a RACS for which a suitable selector exists:
4.2 An optimal strategy 117

Lemma 4.2.4. The mapping

Ω → C : ω 7−→ arg max Vi (b, X̄i (ω))


b∈S(bi−1 (ω),X̄i (ω))

is a RACS for which a selector of the form φi (bi−1 , X̄i ) exists, with a measurable
function
φi : S × [A, B]dm −→ S.

From Lemma 4.2.4 it follows that the recurrence relation (4.2.3) is solved by

b∗0 := a∗ , b∗i := φi (b∗i−1 , X̄i).

In particular, b∗i is a selector for arg maxb∈S(bi−1 ,X̄i ) Vi (b, X̄i), hence Fi -measurable,
so that {b∗i }i constitutes an admissible strategy.
Proof of Lemma 4.2.4. The proof will be given in three steps. Fix some
i ∈ IN.
First note that S(·, ·) : S × [A, B]dm → C : (s, x̄) 7→ S(s, x̄) is a measurable
mapping, so that S(bi , X̄i+1 ) is a RACS: The continuity of gc implies that {b ∈
S|gc (s, x̄, b) = 0} is a closed subset of S. For the measurability of S(·, ·) we
need only show that for any compact K ⊆ S

S −1 (FK ) ∈ B(S × [A, B]dm ),

with the σ-algebra B(S × [A, B]dm ) of Borelian sets on S × [A, B]dm . To this
end, choose a countable dense subset K 0 of K. Using the continuity of gc again,
it is easily verified that

\ [  1

S −1 (FK ) =

(s, x̄) |gc (s, x̄, k)| < ,
0
n
n∈IN k∈K

which implies the measurability of S −1 (FK ).


Secondly, we will see that α : C × [A, B]dm → C : (C, x̄) 7→ arg maxa∈C Vi(a, x̄)
is measurable. This combined with S(bi , X̄i+1 ) being a RACS yields that

arg max Vi (b, X̄i)


b∈S(bi−1 ,X̄i )
118 Chapter 4. A Markov model with transaction costs: probabilistic view

is itself a RACS. As to the measurability of α we consider C ∈ C and a compact


subset K of S. With

\ [  1

−1

α (FK ) = (C, x̄) sup Vi (y, x̄) ≤ Vi (k, x̄) +
,
0 y∈C n
n∈IN k∈K

it suffices to verify that each of the sets


 

(C, x̄) sup Vi (y, x̄) ≤ c (c ∈ IR)
y∈C

is measurable. Indeed, if S 0 is a countable dense subset of S, then


 
(C, x̄) sup Vi(y, x̄) > c = {(C, x̄)|∃y ∈ C ∩ S 0 Vi(y, x̄) > c}


y∈C
[ [
= {(C, x̄)|y ∈ C, Vi (y, x̄) > c} = ({C|y ∈ C} × {x̄|Vi (y, x̄) > c})
y∈S 0 y∈S 0
[
−1

= F{y} × Vi (y, ·) (c, ∞) ,
y∈S 0

and the desired measurability follows.


Thirdly, we apply Theorem 7.33 in Bertsekas and Shreve (1978) (or Alipran-
tis and Border, 1999, Theorem 17.18). From there, for the closed set D :=
{(s, x̄, b)|b ∈ S(s, x̄)}, we can find a measurable function φi : {(s, x̄)|∃b :
(s, x̄, b) ∈ D} → S with

Vi(φi (s, x̄), x̄) = max Vi(b, x̄).


b∈{b|(s,x̄,b)∈D}

Now, the proof is finished observing that {b|(s, x̄, b) ∈ D} = S(s, x̄), {(s, x̄) | ∃b :
(s, x̄, b) ∈ D} = S × [A, B]dm (because of a∗ ∈ S(s, x̄)). Thus

φi (bi−1 , X̄i ) ∈ arg max Vi (b, X̄i)


b∈S(bi−1 ,X̄i )

with probability 1. 2

3rd step in the proof of Theorem 4.2.1: Approximation of strategies


that are based on portfolio selection functions.
For the analysis of the strategy b∗i it will be convenient to approximate strategies
based on portfolio selection functions by members of a smaller class of strategies
which we call “periodic” strategies:
4.2 An optimal strategy 119

Definition 4.2.5. An admissible strategy {bi }i is called N -periodic (N ∈ IN)


if for all k ∈ IN0
bkN = a∗ P − a.s..

For any admissible strategy {bi }i and any  > 0 we define


( n n )
1 X 1X
N ({bi}i , ) := min n Ef (bi, X̄i+1 ) − lim sup Ef (bi , X̄i+1 ) ≤  .

n n→∞ n
i=1 i=1

N ({bi}i , ) measures how long it takes the strategy {bi}i to approach the long-
term optimum for the first time up to an error of . Note that there is some
N ∈ IN such that

1 XN n
1X
Ef (bi, X̄i+1 ) − lim sup Ef (bi , X̄i+1 ) ≤ ,

N n

n→∞
i=1 i=1

so that N ({bi}i , ) < ∞. Using N ({bi}i , ) one can aproximate any strategy
based on a portfolio selection function arbitrarily closely by a periodic strategy
in the following sense:

Lemma 4.2.6. Let {bi }i be an admissible strategy based on a portfolio selection


function c. Then for any  > 0 there exists an admissible N -periodic strategy
{b̃i }i (N ∈ IN) with
n−1 n−1
!
1X 1X
lim inf Ef (b̃i , X̄i+1 ) − Ef (bi , X̄i+1) ≥ −.
n→∞ n i=0 n i=0

n−1
1 P
Proof. Let N := N (({bi}i, ), µn := n Ef (bi , X̄i+1 ) and s := lim supn→∞ µn ,
i=0
i.e.
|µN − s| ≤ . (4.2.9)
Considering the fact that, at any stage, the investor may choose the portfolio
a∗ , we define an N -periodic admissible strategy b̃i by

b̃0 := a∗ , b̃1 := c(a∗ , X̄1 ), ..., b̃N −1 := c(b̃N −2 , X̄N −1 ),


b̃N := a∗ , b̃N +1 := c(a∗ , X̄N +1 ), ..., b̃2N −1 := c(b̃2N −2 , X̄2N −1 ),
b̃2N := a∗ , etc. ... .
120 Chapter 4. A Markov model with transaction costs: probabilistic view

In particular, b̃0 = b0, ..., b̃N −1 = bN −1 . Hence, for all k ∈ IN0 (convention
1
P−1
0 i=0 ... = 0):

kN −1 k−1 (j+1)N −1 k−1


1 X 1X 1 X 1X
µ̃kN := Ef (b̃i, X̄i+1 ) = Ef (b̃i, X̄i+1 ) = µN = µN .
kN i=0 k j=0 N k j=0
i=jN
(4.2.10)
This follows from the construction of b̃i with the selection function c. Indeed, the
matrix (b̃jN , ..., b̃(j+1)N −1, X̄jN +1 , ..., X̄(j+1)N ) is a function of (X̄jN +1 , ..., X̄(j+1)N )
and as such is distributed as (b0 , ..., bN −1, X̄1 , ..., X̄N ) which is used to calculate
µN (by stationarity).
(4.2.9) and (4.2.10) imply that for all k ∈ IN0: |µ̃kN − s| ≤ . Now, let bncN be
the largest kN (k ∈ IN) with kN ≤ n. Then

n−1  
bncN  1 X bncN
|µ̃n − s| = µ̃bncN − s + Ef (b̃i , X̄i+1 ) + − 1 s
n n
i=bncN
n
bncN N −1 n − bncN
≤ µ̃bncN − s + kf k∞ + |s|
n n n
N −1 N −1
≤ 1·+ kf k∞ + |s|,
n n

where we used n − bncN ≤ N − 1. It follows that lim supn→∞ |µ̃n − s| ≤  and


finally that

n−1 n−1
!
1X 1X
lim inf Ef (b̃i, X̄i+1 ) − Ef (bi , X̄i+1 ) ≥ lim inf µ̃n − lim sup µn
n→∞ n i=0 n i=0 n→∞ n→∞

= lim inf µ̃n − s ≥ lim inf − |µ̃n − s| = − lim sup |µ̃n − s| ≥ −,
n→∞ n→∞ n→∞

proving Lemma 4.2.6. 2


In the proof of Theorem 4.2.1, Lemma 4.2.6 will enable us to restrict ourselves
to the class of periodic strategies competing with {b∗i }i. These will turn out to
be much more tractable. We are now in the position to turn to the
4th step in the proof of Theorem 4.2.1: Finishing the proof of Theo-
rem 4.2.1.
Consider a given admissible strategy {bi }i based on a portfolio selection func-
tion. Let  > 0 be arbitrary but fixed. According to Lemma 4.2.6 there exists
4.2 An optimal strategy 121

an N := N ({bi}i , )-periodic admissible strategy {b˜i }i with


n−1 n−1
!
1X 1X
lim inf Ef (b̃i , X̄i+1 ) − Ef (bi , X̄i+1) ≥ −.
n→∞ n i=0 n i=0

We will show that


n−1 n−1
!
1X 1X
lim inf Ef (b∗i , X̄i+1 ) − Ef (b̃i, X̄i+1 ) ≥ 0. (4.2.11)
n→∞ n i=0 n i=0

This yields
n−1 n−1
!
1X 1X
lim inf Ef (b∗i , X̄i+1 ) − Ef (bi , X̄i+1 )
n→∞ n i=0 n i=0
n−1 n−1
!
1X 1X
≥ lim inf Ef (b∗i , X̄i+1 ) − Ef (b̃i, X̄i+1 )
n→∞ n i=0 n i=0
n−1 n−1
!
1X 1X
+ lim inf Ef (b̃i, X̄i+1 ) − Ef (bi , X̄i+1 )
n→∞ n i=0 n i=0
≥ −,

and the assertion follows from  being arbitrary.


It remains to show (4.2.11). To this end, observe that by V2
 
Ef (bi , X̄i+1 ) = E E[f (bi , X̄i+1)|Fi ] = E E[f (b, X̄i+1 )|Fi ]|b=bi

= E E[f (b, X̄i+1)|X̄i ]|b=bi = Em(bi, X̄i ).

Replacing f by m does not alter the value of E( n1 n−1


P
i=0 f (bi , X̄i+1 )), the target
function, hence we may prove (4.2.11) in the form
n−1 n
!
1X ∗ 1X
lim inf Em(bi , X̄i ) − Em(b̃i, X̄i ) ≥ 0.
n→∞ n i=0 n i=1

First note that the Bellman equation implies that for all admissible strategies
{bi }i

(1 − δi+1 )E[hi+1(bi+1 , X̄i+2 )|Fi+1 ]


= m(bi+1, X̄i+1 ) + (1 − δi+1 )E[hi+1(bi+1 , X̄i+2 )|Fi+1 ] − m(bi+1, X̄i+1 )

≤ max m(b, X̄i+1 ) + (1 − δi+1 )E[hi+1 (b, X̄i+2 )|Fi+1 ] − m(bi+1, X̄i+1 )
b∈S(bi ,X̄i+1 )
= hi+1 (bi , X̄i+1 ) − m(bi+1, X̄i+1 ).
122 Chapter 4. A Markov model with transaction costs: probabilistic view

Taking expectations,

(1 − δi+1 )Ehi+1 (bi+1 , X̄i+2 ) = (1 − δi+1 )E E[hi+1 (bi+1 , X̄i+2 )|Fi+1 ]
≤ Ehi+1 (bi , X̄i+1 ) − Em(bi+1, X̄i+1 ),

and summing up we are left with


n−2
!
1 X
E m(bi+1 , X̄i+1 )
n i=−1
n−2
!
1 X 
≤ −E (1 − δi+1 )hi+1 (bi+1 , X̄i+2 ) − hi+1 (bi , X̄i+1 ) .
n i=−1

Equality holds for the strategy {b∗i }i , i.e.


n−1 n−1
! !
1X 1 X
E m(b∗i , X̄i ) − E m(b̃i, X̄i )
n i=0 n i=0
n−2
1 X 
≥E hi+1 (b̃i+1 , X̄i+2 ) − hi+1 (b̃i, X̄i+1 )
n i=−1
n−2
1 X
hi+1 (b∗i+1 , X̄i+2 ) − hi+1 (b∗i , X̄i+1 )

−E
n i=−1
n−1
1X  
+E δi hi (b∗i , X̄i+1) − hi (b̃i, X̄i+1 )
n i=0
=: A − B + C. (4.2.12)

We now investigate the asymptotic behaviour of the terms on the right hand
side.
The first and the second term A and B of (4.2.12) are of the same form and
tend to 0 as n → ∞ since
n−2
1 X 
lim E hi+1 (bi+1 , X̄i+2 ) − hi+1 (bi , X̄i+1 ) = 0
n→∞ n
i=−1

for any admissible strategy {bi }i . To prove this, we set h−1 := 0 and consider
the decomposition
n−2
1 X 
hi+1 (bi+1 , X̄i+2 ) − hi+1 (bi , X̄i+1 ) = D + E
n i=−1
4.2 An optimal strategy 123

into
n−2
1 X 
D := hi+1 (bi+1 , X̄i+2 ) − hi (bi , X̄i+1 )
n i=−1
and
n−2
1 X 
E := hi (bi , X̄i+1 ) − hi+1 (bi , X̄i+1 ) .
n i=−1
D is a telescopic sum, and using part 1 of Lemma 4.2.3 and the assumptions
about {δi }i , we find that

hn−1 (bn−1, X̄n ) − h−1 (b−1 , X̄0 ) khn−1 k∞ kf k∞
|D| = ≤ ≤ → 0.
n n δn−1 n
As to E we note that δ1 = δ2 , δ3 = δ4 = δ5 , δ6 = ... = δ9 , etc. implies that

h1 = h2 , h3 = h4 = h5 , h6 = ... = h9 , ... . Hence there are at most 2n + 1 non-
zero differences in E. By virtue of Lemma 4.2.3 these are bounded in absolute
value by maxi khi+1 − hi k∞ ≤ kf k∞ , which yields

2n + 1
|E| ≤ kf k∞ −→ 0.
n

The third term C of (4.2.12) is decomposed into


n−1
1X  
E δi hi (b∗i , X̄i+1 ) − hi (b̃i , X̄i+1 )
n i=0
n−1
1X
δi hi (b∗i , X̄i+1 ) − hi+1 (b∗i , X̄i+1 )

= E
n i=0
n−1
1X
δi hi+1 (b∗i , X̄i+1 ) − hi+1 (a∗ , X̄bicN +1 )

+E
n i=0
n−1
1X
δi hi+1 (a∗, X̄bicN +1 ) − hbicN +1 (a∗ , X̄bicN +1 )

+E
n i=0
n−1
1X  
+E δi hbicN +1 (a∗ , X̄bicN +1 ) − hi (b̃i, X̄i+1 )
n i=0

The absolute value of the first expectation in the decomposition is bounded


from above by
n−1
1X
δi kf k∞ −→ 0 (n → ∞)
n i=0
124 Chapter 4. A Markov model with transaction costs: probabilistic view

(Lemma 4.2.3, part 2). Using Lemma 4.2.3, (4.2.8), we can bound the second
expectation from below by
n−1
1X
−2kf k∞ δi −→ 0 (n → ∞).
n i=0

The third expectation has a lower bound


n−1 n−1
1X 1X
−kf k∞ δi (i − bicN ) ≥ −kf k∞ (N − 1) δi −→ 0 (n → ∞)
n i=0 n i=0

(Lemma 4.2.3, part 2). Therefore, we need only show that the fourth expecta-
tion satisfies
n−1
1X  
lim inf δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i , X̄i+1 ) ≥ 0. (4.2.13)
n→∞ n i=0

This will be done exploiting the periodicity of {b̃i }i . In order to prove (4.2.13)
we first assume that n = kN with k ∈ IN0 :
kN −1
1 X  
δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i , X̄i+1 )
kN i=0
k−1 (j+1)N −1
1X 1 X  
= δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i , X̄i+1 ) .
k j=0 N
i=jN

Here, for i ∈ {jN, ..., (j + 1)N − 1}

EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i , X̄i+1 )


= EhjN +1 (b̃jN , X̄jN +1 ) − EhjN +1 (b̃jN +1 , X̄jN +2 )
+EhjN +1 (b̃jN +1 , X̄jN +2 ) − EhjN +2 (b̃jN +1 , X̄jN +2 )
+EhjN +2 (b̃jN +1 , X̄jN +2 ) − EhjN +2 (b̃jN +2 , X̄jN +3 )
+EhjN +2 (b̃jN +2 , X̄jN +3 ) − EhjN +3 (b̃jN +2 , X̄jN +3 )
+EhjN +3 (b̃jN +2 , X̄jN +3 ) − EhjN +3 (b̃jN +3 , X̄jN +4 )
+ ... − ...
+Ehi−1 (b̃i−1 , X̄i) − Ehi (b̃i−1 , X̄i )
+Ehi (b̃i−1, X̄i ) − Ehi (b̃i , X̄i+1 )
≥ −2kf k∞ (i − jN ) − kf k∞ (i − jN − 1), (4.2.14)
4.2 An optimal strategy 125

where the 1st, 3rd, etc. line after the equality (the non-indented terms) can be
bounded by (4.2.7), the 2nd, 4th, etc. term (the indented terms) by Lemma
4.2.3, part 2. Consequently,

EhbicN +1 (a∗, X̄bicN +1 ) − Ehi (b̃i , X̄i+1 ) ≥ −3kf k∞ (i − jN )

and
kN −1
1 X  
δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i, X̄i+1 )
kN i=0
k−1 (j+1)N −1 k−1 (j+1)N −1
3kf k∞ X 1 X 3kf k∞ X δjN X
≥− δi (i − jN ) ≥ − (i − jN )
k j=0
N k j=0
N
i=jN i=jN
k−1 N −1 k−1
! ! !
1X 1 X 1X N −1
= −3kf k∞ δjN i = −3kf k∞ δjN . (4.2.15)
k j=0 N i=0 k j=0 2

For arbitrary n ∈ IN (not necessarily n = kN ), we find that


n−1
1X  
δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i , X̄i+1 )
n i=0
 
 bnc bncN −1 
N 1 X 
= · δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i, X̄i+1 )
 n bncN i=0 
 
1 X n−1  
+ δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi(b̃i , X̄i+1 ) . (4.2.16)
n 
i=bncN

Set k := bn/N c and bound the first bracket in (4.2.16) from below by
k−1
!
bncN 3kf k∞ (N − 1) 1 X
− · δjN −→ 0 (n → ∞)
n 2 k j=0

(use (4.2.15) and note that n → ∞ implies k → ∞). The absolute value of the
second bracket is bounded from above by
n−1
1 X 
δi khbicN +1 k∞ + khi k∞
n
i=bncN
n−1
1 X 
≤ δbicN khbicN k∞ + δbicN khbicN +1 − hbicN k∞ + δi khi k∞
n
i=bncN
126 Chapter 4. A Markov model with transaction costs: probabilistic view
n−1
1 X n − bncN N −1
≤ 3kf k∞ = 3kf k∞ ≤ 3kf k∞ −→ 0 (n → ∞),
n n n
i=bncN

which concludes the proof. 2

4.3 Further properties of the value function


As a preparation for the next chapter, where we will be dealing with the case
when the distribution of the return process is unknown, we prove the following
result concerning the Lipschitz continuity of the value function hi :

Proposition 4.3.1. Let (hi , 0) be a solution of the Bellman equation (4.2.5)


for δ = δi (i = 1, 2, ...), then for sufficiently small commission factor c ≥ 0 there
exists a constant K > 0 such that

|δi · hi (b1, x̄) − δi · hi (b2, x̄)| ≤ K · kb1 − b2 k∞

for all b1 , b2 ∈ S, all x̄ ∈ [A, B]dm and all i ∈ IN.

Proof. The argument requires some less known notions from analysis, espe-
cially Clarke’s generalized derivative for Lipschitz continuous functions (Clarke,
0 00
1981). Let W ⊆ IRd and Z ⊆ IRd be Banach spaces (whose supremum-
norms are both denoted by k · k∞ ). Given a Lipschitz continuous mapping
Φ : W × Z → IR, Clarke’s generalized derivative is defined as the convex hull
of a limit set,
n o
∂w Φ(w, z) := conv lim ∇wi Φ(wi, z) wi → w ,

i→∞

where only those sequences wi are considered for which all gradients ∇wi Φ(wi , z)
and the limit limi→∞ ∇wi Φ(wi , z) exist (note that the gradients exist almost
everywhere due to Rademacher’s Theorem). H(M1 , M2 ) denotes the Hausdorff
0
distance between two subsets M1 and M2 of IRd , defined by
 
H(M1 , M2) := max sup ρ(w2 , M1 ), sup ρ(w1 , M2 )
w2 ∈M2 w1 ∈M1

with ρ(w2, M1 ) := inf w1 ∈M1 kw1 − w2k∞ .


4.3 Further properties of the value function 127

We will also need the following proposition by Ledyaev (1984, Theorem 1)


concerning Lipschitz continuity of implicitly defined set-valued functions:

0 00
Proposition 4.3.2. (Ledyaev, 1984, Theorem 1) Let W ⊆ IRd and Z ⊆ IRd
be Banach spaces and Φ : W × Z −→ IR a mapping such that:

1. For all z ∈ Z the function Φ(·, z) is Lipschitz continuous.

2. There exists a constant L > 0 with

|Φ(w, z1 ) − Φ(w, z2 )| ≤ Lkz1 − z2 k∞

for all w ∈ W, z1 , z2 ∈ Z.

3. There exists a constant ∆ such that for all (w, z) ∈ W ×Z with Φ(w, z) > 0

inf kvk∞ > ∆.


v∈∂w Φ(w,z)

Then the set-valued mapping M (z) := {w ∈ W |Φ(w, z) ≤ 0} satisfies the


Lipschitz property
L
H(M (z1 ), M (z2)) ≤ kz1 − z2 k∞ .

Finally, we need the modulus of continuity, defined for any continuous function
g : S × [A, B]dm → IR as (x̄ ∈ [A, B]dm fixed)
 

ω g(·, x̄),  := sup g(s, x̄) − g(t, x̄) .

s,t∈S,ks−tk∞ ≤

Combining the Hausdorff distance and the modulus of continuity it is easily


seen that
 

max g(b, x̄) − max g(b, x̄) ≤ ω g(·, x̄), H(S(s, x̄), S(t, x̄)) . (4.3.1)
b∈S(s,x̄) b∈S(t,x̄)

Thus, having all the tools we need at hand, we can embark on the proof of
Proposition 4.3.1.
Let b = (b−1, ..., bm) ∈ S, x̄ = (x1, ..., xm) ∈ [A, B]dm . For fixed x̄ we set
Φ : S × S → IR : (b, s) 7→ Φ(b, s) := |gc (s, x̄, b)|, recall gc from (4.1.3). Clearly,
Φ is Lipschitz continuous in the first argument. Moreover,

|Φ(b, s) − Φ(b, t)| ≤ |gc (s, x̄, b) − gc (t, x̄, b)| ≤ c · const(r, B, m) · ks − tk∞
128 Chapter 4. A Markov model with transaction costs: probabilistic view

(to see this, note that the self-financing condition forces b−1 ≤ c). Taking the
gradient for gc (under Φ(b, s) > 0) yields
m
  !
X
m
∂b Φ(b, s) ⊆ {1} × {0} × [0, c] · (1 + r)b0 + x k bk ,
k=1

so that
m
X
inf kvk∞ ≥ (1 + r)b0 + x k bk
v∈∂b Φ(b,s)
k=1
≥ min{1 + r, A}(1 − b−1 ) ≥ const(r, A)(1 − c).

As a consequence, the conditions of Propositon 4.3.2 are fulfilled. We can choose


∆ := const(r, A) · (1 − c) and L := c · const(r, B, m) to obtain
L
H(S(s, x̄), S(t, x̄)) ≤ ks − tk∞ . (4.3.2)

For c sufficiently small, L/∆ ≤ 1. Recall the function

Vi (b, x̄) = m(b, x̄) + (1 − δi )E[hi (b, X̄d+1)|X̄d = x̄]

that defined hi in (4.2.2). The Bellman equation yields


 

hi (s, x̄) − hi (t, x̄) = max Vi(b, x̄) − max Vi(b, x̄) ≤ ω Vi (·, x̄), ks − tk∞
b∈S(s,x̄) b∈S(t,x̄)

by (4.3.1) and (4.3.2). Hence,


       
ω hi (·, x̄),  ≤ ω Vi (·, x̄),  ≤ ω f (·, x̄),  + (1 − δi )ω hi (·, x̄),  .

 
The Lipschitz property of f yields ω f (·, x̄),  ≤ const. ·  and from the latter
chain of inequalities we obtain
 
const.
ω hi (·, x̄),  ≤ · .
δi

Thus δi · hi (·, x̄) is Lipschitz continuous with the same Lipschitz constant as f
(independent of x̄), and the proof is finished. 2
129

CHAPTER 5

A Markov model with transaction costs:


statistical view
Chapter 4 completely remained within the probabilistic framework, i.e. the
point of view of an investor with full knowledge of the underlying return dis-
tribution. This, of course, is highly unrealistic. In practice, the investor’s view
is one of a statistician rather than a probabilist. From observations of stock
returns, factors and side information, he assembles a market “picture”, an idea
of the stochastic laws of the market. He then decides on an investment strat-
egy. In the following it will be shown how he can balance the need to avoid
transaction costs and the need to boost his wealth in his investment decisions
– without knowing the underlying return distribution. The model is the same
as in Chapter 4.
In particular, Section 5.1 sets up an empirical counterpart of the Bellman equa-
tion which we used in Theorem 4.2.1 to construct an optimal strategy. Based
on this empirical version of the Bellman equation we construct a portfolio se-
lection algorithm in Section 5.1.1 (cf. (5.1.5)). This algorithm has virtually
the same optimality properties as the algorithm in Chapter 4 (Theorems 5.1.1
and 5.1.2). To verify this, we need results on uniformly consistent regression
estimation which will be given in Section 5.2, Theorem 5.2.1 and Corollary
5.2.2 being the central results featuring the speed of uniform convergence of
kernel regression estimates. Section 5.3 finally gives the proof of the optimality
properties of the algorithm.

5.1 The empirical Bellman equation


The whole procedure described in the last chapter was founded on the Bellman
130 Chapter 5. A Markov model with transaction costs: statistical view

equation. In particular, in Theorem 4.2.1, we used the Bellman equation



λi + hi (s, x̄) = max m(b, x̄) + (1 − δi )E[hi (b, X̄d+1 )|X̄d = x̄] , (5.1.1)
b∈S(s,x̄)

λi ∈ IR, hi ∈ C(S × [A, B]dm ), to construct an optimal investment strategy in


a stationary d-stage Markov process {Xi }∞i=−d+1 of return vectors in a financial
market. Here,

m(b, x̄) := E[f (b, X̄i+1 )|X̄i = x̄] = E[f (b, X̄d+1)|X̄d = x̄],

f being the (logarithmic) utility function and S(s, x̄) being the set of admissible
portfolio vectors when x̄ denotes the last d observed return vectors and s the
last chosen portfolio. The Bellman equation was solved by a value iteration
type of algorithm in Chapter 4, which crucially relies on the distribution of
the stationary process {X̄i }i. This distribution, in general, is unknown to the
investor. Nonetheless he may try to estimate a solution of the Bellman equation.
A natural way to obtain an estimated solution is to replace the conditional
expectations E[hi(b, X̄d+1 )|X̄d = x̄] and m(b, x̄) in (5.1.1) by kernel estimates
 
i−1
X K (x̄ − X̄ j )/w i
hi (b, X̄j+1 ) · i−1  
P
j=−d+1 K (x̄ − X̄k )/wi
k=−d+1

and  
i−1
X K (x̄ − X̄j )/wi
f (b, X̄j+1 ) · i−1
 ,
P
j=−d+1 K (x̄ − X̄k )/wi
k=−d+1

respectively. K is a bounded, Lipschitz continuous kernel function [A, B]dm −→


IR+
R R
0 with IRdm K(x)dx = 1 and IRdm K(x)kxk∞dx < ∞. As we shall see,
the bandwidths can be chosen as wi ∼ i−1/(dm+2) . For simplicity, we us the
shorthand notation
 
K (x̄ − X̄j )/wi
Ki(X̄j , x̄) := i−1   with wi ∼ i−1/(dm+2) .
P
K (x̄ − X̄k )/wi
k=−d+1
5.1 The empirical Bellman equation 131

We thus obtain what we call the empirical Bellman equation,


 
i−1 
 X  
λ̂i + ĥi (s, x̄) = max f (b, X̄j+1 ) + (1 − δi )ĥi (b, X̄j+1 ) Ki(X̄j , x̄) ,
b∈S(s,x̄)  
j=−d+1
(5.1.2)
which is to be solved for λ̂i ∈ IR, ĥi ∈ C(S × [A, B]dm ).

5.1.1 An optimal strategy


If we set up an investment strategy as in Theorem 4.2.1, however, using the
empirical Bellman equation instead of the original Bellman equation (4.2.2),
two questions arise: Is there a solution to the empirical Bellman equation and
if so, is the corresponding strategy optimal?
We first tackle the existence of solutions of (5.1.2). Using the fact that Ki(X̄j , x̄)
is a continuous function of x̄ and appealing to Berge’s Maximum Theorem
(Aliprantis and Border, 1999, Theorem 16.31) again, we can define an operator
M̂i by

M̂i : C(S × [A, B]dm ) −→ C(S × [A, B]dm ) :


 
i−1
 X 

h 7−→ max f (b, X̄j+1 ) + (1 − δi )h(b, X̄j+1 ) Ki (X̄j , x̄) .
b∈S(s,x̄)  
j=−d+1

M̂i is the empirical counterpart of the operator M in the proof of Proposition


4.2.2. Because of
Xi−1
Ki (X̄j , x̄) = 1, (5.1.3)
j=−d+1

M̂i operates on C(S × [A, B]dm )/{constant functions} according to M̂i [h] :=
[M̂i h]. Now, arguing similarly as in the proof of Proposition 4.2.2, we find that
M̂i is a contraction mapping in the norm kgk := max g − min g. Indeed, for
functions g, h ∈ C(S × [A, B]dm ) there exists a b∗ ∈ S(s, x̄) with
i−1
X
f (b∗ , X̄j+1 ) + (1 − δi )h(b∗ , X̄j+1 ) Ki (X̄j , x̄),

(M̂i h)(s, x̄) =
j=−d+1
i−1
X
f (b∗ , X̄j+1 ) + (1 − δi )g(b∗ , X̄j+1 ) Ki(X̄j , x̄).

(M̂i g)(s, x̄) ≥
j=−d+1
132 Chapter 5. A Markov model with transaction costs: statistical view

Using (5.1.3), this implies

(M̂i h)(s, x̄) − (M̂i g)(s, x̄)


i−1
X
h(b∗ , X̄j+1 ) − g(b∗, X̄j+1 ) Ki(X̄j , x̄) ≤ (1 − δi )kh − gk∞ .

≤ (1 − δi )
j=−d+1

Starting from here, one can argue exactly as in the proof of Proposition 4.2.2
to obtain kM̂i h − M̂i gk ≤ (1 − δi )kh − gk. Hence there exists a solution ĥi , λ̂i
of the empirical Bellman equation. Due to (5.1.3) this can be normalized to
 
i−1 
 X  
ĥi (s, x̄) = max f (b, X̄j+1 ) + (1 − δi )ĥi (b, X̄j+1 ) Ki(X̄j , x̄) .
b∈S(s,x̄)  
j=−d+1
(5.1.4)
In the sequel, ĥi denotes a solution of (5.1.4).

We are now in a position to define the strategy which will turn out to have the
same optimality properties as the strategy in Chapter 4. On the analogy of
(4.2.3) the investor follows the strategy

b̂0 := a∗ and b̂i := arg max V̂i (b, X̄i) (5.1.5)


b∈S(b̂i−1 ,X̄i )

with
i−1 
X 
V̂i(b, x̄) := f (b, X̄j+1 ) + (1 − δi )ĥi (b, X̄j+1 ) Ki (X̄j , x̄).
j=−d+1

Observe that, in contrast to b∗i from Chapter 4, this strategy can be constructed
using observed data only, we need not know the underlying distribution of the
return process.

The important feature is: This strategy is still optimal if the kernel estimates
work sufficiently well. A sufficient condition is given in the following theorem.

Theorem 5.1.1. Under the assumptions of Theorem 4.2.1, let hi be the


solutions of the Bellman equation (5.1.1, λi = 0) and define the class G :=
{δi · hi , δi · f |i = 1, 2, ...}.
5.1 The empirical Bellman equation 133

Then
i
X
1
lim 2 sup E sup g(b, X̄j+1 )Ki+1 (X̄j , x̄) (5.1.6)
i→∞ δi+1 g∈G dm
b∈S,x̄∈[A,B]

j=−d+1


−E[g(b, X̄i+2)|X̄i+1 = x̄] = 0

implies that
n−1 n−1
!
1X 1X
lim inf Ef (b̂i , X̄i+1 ) − Ef (bi , X̄i+1 ) ≥0 (5.1.7)
n→∞ n i=0 n i=0

for any admissible strategy {bi}i based on a portfolio selection function.

This theorem translates the investment problem into a regression estimation


problem. Of course, it is desirable to have practical sufficient conditions for the
assumptions of Theorem 5.1.1 to hold. As we shall see, choosing δi ≥ 1/ log i
suffices, for example we may choose δ1 = δ2 := d1 , δ3 = δ4 = δ5 := d2, etc. with
di ∼ 1/ log i (cf. Theorem 5.1.2 below).
In the following, a stochastic process {Xi}∞
i=−∞ is called a stationary GSM-
process (geometrically strongly mixing), if beyond stationarity the follow-
ing holds: There exist constants c > 0 and ρ ∈ [0, 1) such that the α-mixing
coefficients

α(k) := α(σ(Xi, i ≤ 0), σ(Xi, i ≥ k)) := sup |P(B ∩ C) − P(B)P(C)|


B∈σ(Xi ,i≤0)
C∈σ(Xi ,i≥k)

satisfy
α(k) ≤ c · ρk (k ≥ 1).
The behaviour of the α-mixing coefficients α(k) mirrors how fast dependency
in the process variables decays for large time lags k. Under mild assumptions,
the class of GSM-processes comprises linear processes (Bosq, 1996, Sec. 1.3,
2.3), polynomial AR-processes (Doukhan, 1994, Sec. 2.4.1, Th. 5) such as
ARMA-processes (Doukhan, 1994, Sec. 2.4.1.2, Th. 6, Cor. 3), and Doeblin-
or Harris-recurrent Markov chains (Doukhan, 1994, Sec. 2.4, Th. 1 and 3).

Theorem 5.1.2. The assumptions of Theorem 5.1.1 are fulfilled if the following
hold:
134 Chapter 5. A Markov model with transaction costs: statistical view

1. {Xi }∞ m
i=−d+1 is a stationary [A, B] -valued d-stage Markov process (cf. V1-
V3 in Section 4.1) and geometrically strongly mixing.

2. There exist densities fX̄0 and fX0 |X̄0 of the distributions PX̄0 and PX0 |X̄0 ,
respectively, such that

– fX̄0 is Lipschitz continuous, i.e., for some C > 0,

|fX̄0 (x̄) − fX̄0 (ȳ)| ≤ Ckx̄ − ȳk∞ for all x̄, ȳ ∈ [A, B]dm ,

– the level sets {x̄ : fX̄0 (x̄) ≥ 1/n} of fX̄0 satisfy

H(supp fX̄0 , {x̄ : fX̄0 (x̄) ≥ 1/n}) ≤ n−k

for some k > 0, H denoting the Hausdorff distance (cf. Section 4.3),
– fX0 |X̄0 is Lipschitz continuous such that for some C > 0,

|fX0 |X̄0 (x, x̄) − fX0 |X̄0 (x, ȳ)| ≤ Ckx̄ − ȳk∞

for all x ∈ [A, B]m , x̄, ȳ ∈ [A, B]dm .

3. The commission factor c ist sufficiently small.

4. δi ≥ 1/ log i satisfies (4.2.1).

Hence, under the (not too restrictive) conditions of Theorem 5.1.2 we are able
to construct an admissible strategy {b̂i }i that is superior to any other admissible
strategy {bi}i based on a portfolio selection function in the sense of
n−1 n−1
!
1X 1X
lim inf Ef (b̂i , X̄i+1 ) − Ef (bi , X̄i+1 ) ≥ 0.
n→∞ n i=0 n i=0

It should be stressed again that this is a conservative, i.e. worst case analysis,
the lim inf giving the worst possible performance of our strategy {b̂i}i .
Remark. There are a number of sufficient conditions for {Xi }i to be a station-
ary GSM-process (see, e.g., Doukhan, 1994). For example, the GSM property
holds if there exists a continuous function r : [A, B]m → IR+
0 with

fX0 |X̄0 (x0 |x̄0) ≥ r(x0)


5.2 Uniformly consistent regression estimation 135

for all x̄0 ∈ [A, B]dm , x0 ∈ [A, B]m and


Z
r(x0)dx0 > 0.
[A,B]m

To verify this, one may use the Doeblin condition in Theorem 1 of Doukhan
(1994, Sec. 2.4): The transition probabilities of the d-stage Markov process
{Xi }i are Z
P(x̄0, C) = fX0 |X̄0 (x0|x̄0 )dx0
C

for x̄0 ∈ [A, B]dm , C ∈ B([A, B]m ). In particular, with the measure µ := r · λ (λ
denoting the Lebesgue-Borel-measure on [A, B]m ),
Z
P(x̄0 , C) ≥ r(x0)dx0 = µ(C)
C

and Z
m
µ([A, B] ) = r(x0 )dx0 ∈ (0, 1].
[A,B]m

Under these circumstances, the cited theorem of Doukhan (1994) allows us to


conclude that {Xi }i is a GSM-process.

5.1.2 How to prove optimality


Before we actually prove the main results, Theorems 5.1.1 and 5.1.2, we sketch
the way in which we are going to proceed. Before we can start with the proof,
we need to establish some results on uniformly consistent regression estimation.
This will be done in the next section. Once having got results on the speed of
uniform almost sure convergence we can embark on the core of the proof of the
theorems in Section 5.3. There, we argue along the lines of the corresponding
results in Chapter 4, using uniform consistency results to pass from the solu-
tion of the empirical Bellman equation to the solution of the original Bellman
equation.

5.2 Uniformly consistent regression estimation


We start with the following curve estimation problem. Given data X0 , X1 , ..., Xn
136 Chapter 5. A Markov model with transaction costs: statistical view

from a stationary IRd -valued stochastic process {Xi}∞


i=−∞ and a function g :
d d0
S × IR −→ IR (S ⊆ IR compact), the objective is to estimate both, the
function
φ(g, b, x) := E[g(b, X1 )|X0 = x] · fX0 (x)

as well as the regression function

R(g, b, x) := E[g(b, X1 )|X0 = x].

The first will be estimated uniformly in b ∈ S and x ∈ IRd by the kernel estimate
n−1  
1 X x − Xi
Zn (g, b, x) := g(b, Xi+1)K , (5.2.1)
nwnd i=0 wn

the latter by
Zn (g, b, x)
Rn (g, b, x) := .
Zn (1, b, x)
K : IRd −→ IR+ 0R is assumed to be a fixed bounded, Lipschitz continuous kernel
function with IRd K(x)dx = 1 and IRd K(x)kxk∞dx < ∞. wn ∈ IR+ is a
R

sequence of bandwidths to be chosen later (such that limn→∞ wn = 0). 1 denotes


the constant function (b, x) 7→ 1. Note that Zn (1, b, x) is an estimate of the
density fX0 (x).
Bosq (1996, Theorems 2.2, 3.3, 3.3) derives rates for the uniform almost sure
convergence of Zn (1, b, x) and Zn(g, b, x) for a fixed g in stationary GSM-
processes. Now, the following theorem generalizes the results of Bosq such
as to feature the rate of convergence of the expected k · k∞ -error of the es-
timate (5.2.1) in GSM-processes uniformly over a huge class of functions g.
For this, we agree on the following notation: F is the class of sequences
{c ns logt n}n∈IN (c > 0, s > 0, t ∈ IR or c > 0, s = 0, t > 1). For any set
X ⊆ IRd we denote the k · k∞-diameter of X by diam(X ) := supx,y∈X kx − yk∞ .
fX0 is a density of X0 , fX1 |X0 a density of the conditional distribution PX1 |X0 .
For the estimation of φ, we will prove the

d
Theorem 5.2.1. Let {Xi }∞ i=0 be a stationary GSM-process of IR -valued ran-
dom variables with a Lipschitz continuous density fX0 . Moreover, let G be a
0
class of functions g : S × IRd −→ IR (S ⊆ IRd compact) with the follow-
ing property: There exists a constant C > 0 such that for all g ∈ G and all
5.2 Uniformly consistent regression estimation 137

a, b ∈ S, x, y ∈ IRd

kgk∞ ≤ C, (5.2.2)
Z
|g(a, x)|dx ≤ C, (5.2.3)
IRd
|g(a, x) − g(b, x)| ≤ C · ka − bk∞, (5.2.4)
|E[g(b, X1 )|X0 = x] − E[g(b, X1)|X0 = y]| ≤ C · kx − yk∞ . (5.2.5)

Furthermore, choose wn := n−1/(d+2) . Then


!
logβ n
sup E sup |Zn (g, b, x) − φ(g, b, x)| = o
g∈G x∈Xn ,b∈S n1/(d+2)

for any β > 1 and any sequence Xn ⊆ IRd with diam(Xn) ∈ F.

Now, if the support of fX0 , supp fX0 = {x : fX0 (x) > 0}, is a compact subset
of IRd , the theorem may be used to derive the following result concerning the
estimation of the regression function R. Recall that H denotes the Hausdorff
distance (cf. Section 4.3).

Corollary 5.2.2. Under the assumptions of Theorem 5.2.1, if fX0 has compact
support and the level sets
 
1
Xn := x : fX0 (x) ≥ (5.2.6)
n

satisfy
const.
H (suppfX0 , Xn) ≤ (0 < k ≤ ∞), (5.2.7)
nk
then
!
logβ n
sup E sup |Rn (g, b, x) − R(g, b, x)| = O
g∈G x∈suppfX0 ,b∈S nk/((d+2)(k+1))

for any β > 1.

Remark. The additional assumption (5.2.7) is not too restrictive. In partic-


ular, if inf x∈X f (x) ≥ const. > 0, we can chose k = ∞ to obtain the “optimal”
138 Chapter 5. A Markov model with transaction costs: statistical view

rate !
logβ n
sup E sup |Rn (g, b, x) − R(g, b, x)| = O
g∈G x∈suppfX0 ,b∈S n1/(d+2)
for any β > 1 (for the optimality of rates see, e.g., Györfi et al., 2002).
The proofs of Theorems 5.2.1 and Corollary 5.2.2 refine arguments used in the
proofs of Theorems 2.2 and 3.2 in Bosq (1996). We first give the
Proof of Theorem 5.2.1. We write supx,b instead of supx∈Xn ,b∈S .The estima-
tion error can be decomposed into stochastic and deterministic error,

sup |Zn (g, b, x) − φ(g, b, x)|


x,b
≤ sup |Zn (g, b, x) − EZn (g, b, x)| + sup |EZn (g, b, x) − φ(g, b, x)|.
x,b x,b

1st step: Analysis of the deterministic error sup |EZn − φ|.


Write EZn(g, b, x) as
n−1 Z    
1 X x − Xi
EZn (g, b, x) =
nwnd i=0 IRd
E g(b, Xi+1)K
wn Xi = y · fXi (y)dy
Z
= φ(g, b, x − zwn)K(z)dz,
IRd
R
so that, with K(z)dz = 1,
Z
sup |EZn (g, b, x) − φ(g, b, x)| ≤ sup |φ(g, b, x − zwn) − φ(g, b, x)|K(z)dz.
x,b x,b IRd

The integrand is bounded by

|φ(g, b, x − zwn) − φ(g, b, x)|




≤ E[g(b, X1)|X0 = x − zwn] − E[g(b, X1)|X0 = x] · fX0 (x − zwn)
 

+ fX0 (x − zwn) − fX0 (x) · E |g(b, X1 )| X0 = x
(5.2.8)
≤ const. · kzk∞wn ,
R
where we have used (5.2.2) - (5.2.5). By assumption, the integral K(z)kzk∞dz
is finite, which yields

sup |EZn (g, b, x) − φ(g, b, x)| ≤ const. · wn . (5.2.9)


x,b
5.2 Uniformly consistent regression estimation 139

Note that the constant only depends on K, fX0 and on C in (5.2.2) - (5.2.5),
not however on the specific g ∈ G under consideration.
2nd step: Analysis of the deterministic error sup |Zn − EZn|.
We first cover Xn by dνn ed (νn ≥ 1) cubes
 
diam(Xn)
C(j, n) := x kx − xj,nk∞ ≤

2 · dνn e
with side lengths diam(Xn)/dνn e and centres xj,n (j = 1, ..., dνned ). Analogously,
0
S is covered by dνn ed cubes
 
diam(S)
S(k, n) := b ∈ S kb − bk,n k∞ ≤

2 · dνn e
0
(k = 1, ..., dνned ). From this,

sup |Zn (g, b, x) − EZn (g, b, x)| = sup sup |Zn (g, b, x) − EZn (g, b, x)|
x,b j,k x∈C(j,n),b∈S(k,n)
≤ sup |Zn (g, bk,n , xj,n) − EZn (g, bk,n , xj,n)|
j,k
+ sup sup |Zn (g, b, x) − Zn (g, bk,n , xj,n)|
j,k x∈C(j,n),b∈S(k,n)
+ sup sup |EZn (g, b, x) − EZn (g, bk,n, xj,n )|.
j,k x∈C(j,n),b∈S(k,n)

On the other hand, using (5.2.2) - (5.2.4),

|Zn (g, b, x) − Zn (g, bk,n, xj,n )|


n−1  
1 X x − Xi
≤ g(b, Xi+1 ) − g(bk,n , Xi+1 ) · K

nwd n i=0 wn
n−1    
1 X x − Xi xj,n − Xi
+ |g(b k,n , X i+1 )| · K − K
nwnd i=0 wn wn
1 1 kx − xj,nk∞
≤ const. · · kb − bk,n k∞ + const. · d ·
wnd wn wn
max(diam(Xn), diam(S))
≤ const. · ,
wnd+1 νn
where without loss of generality we have assumed that wn ≤ 1. Again, the
constant does not depend on b ∈ S, x ∈ IRd and g ∈ G.
Using the same argument, we find that
max(diam(Xn ), diam(S))
|EZn (g, b, x) − EZn (g, bk,n , xj,n)| ≤ const. · ,
wnd+1 νn
140 Chapter 5. A Markov model with transaction costs: statistical view

and hence
sup |Zn (g, b, x) − EZn(g, b, x)|
x,b
≤ sup |Zn (g, bk,n , xj,n) − EZn (g, bk,n , xj,n)|
j,k
max(diam(Xn), diam(S))
+2 · const. · .
wnd+1 νn
This result yields (0 < rn → ∞ will represent the desired rate of convergence
at a later stage)
rn · E sup |Zn (g, b, x) − EZn (g, b, x)|
x,b
 
≤ E rn · sup |Zn (g, bk,n , xj,n) − EZn (g, bk,n , xj,n)|
j,k
max(diam(Xn), diam(S))
+2 · const. · rn ·
wnd+1 νn
d
2kgk∞ kKk
Z ∞ rn /wn  

= P sup |Zn (g, bk,n , xj,n) − EZn (g, bk,n , xj,n)| > d
j,k rn
0
max(diam(Xn), diam(S))
+2 · const. · rn ·
wnd+1 νn
d
2kgk∞ kKk r /w
Z ∞n n  
X 
≤ µ+ P |Zn (g, bk,n , xj,n) − EZn (g, bk,n , xj,n)| > d
rn
j,k µ
max(diam(Xn), diam(S))
+2 · const. · rn · , (5.2.10)
wnd+1 νn
where µ > 0 is arbitrary.
3rd step: Combining the results of step 1 and step 2.
Assume for the moment that
 

P |Zn (g, bk,n, xj,n ) − EZn(g, bk,n , xj,n)| >
rn
(1) (2)
has an upper bound pn () + pn () independent of g, b and x with the following
four properties:

1. We have
Z∞
p(1)
n ()d < ∞. (5.2.11)
µ
5.2 Uniformly consistent regression estimation 141

2. For all  > 0:


0
lim (νn + 1)d+d p(1)
n () = 0. (5.2.12)
n→∞

3. For all µ > 0 there exists a N (µ) ∈ IN such that for all  ≥ µ we have
that
0
(νn + 1)d+d p(1)
n () is monotonic decreasing for n ≥ N (µ). (5.2.13)

4. We have
d
2kgk∞ kKk
Z ∞ rn/wn
0
lim (νn + 1)d+d p(2)
n ()d = 0. (5.2.14)
n→∞
µ

We then infer from (5.2.9) and (5.2.10) that

lim sup sup rn · E sup |Zn (g, b, x) − φ(g, b, x)|


n→∞ g∈G x∈Xn
b∈S

Z∞
d+d0
≤ µ + lim sup (νn + 1) p(1)
n ()d
n→∞
µ
d
2kgk∞ kKk
Z ∞ rn /wn
0
+ lim sup (νn + 1)d+d p(2)
n ()d
n→∞
µ
 
max(diam(Xn ), diam(S))
+ lim sup const. · rn · wn + .
n→∞ wnd+1 νn
The second term is zero using (5.2.11)-(5.2.13) and the monotone convergence
theorem, the third term is zero because of (5.2.14). µ being arbitrary we obtain

lim sup sup rn · E sup |Zn (g, b, x) − φ(g, b, x)|


n→∞ g∈G x,b
 
max(diam(Xn ), diam(S))
≤ lim sup const. · rn · wn + , (5.2.15)
n→∞ wnd+1 νn
from which we shall determine rn .
(1) (2)
4th step: Finding a bound pn () + pn () for the 3rd step.
To this end let
    
1 x − Xi x − Xi
Wi,n := Wi,n (b, x) := d g(b, Xi+1 )K − Eg(b, Xi+1 )K .
wn wn wn
142 Chapter 5. A Markov model with transaction costs: statistical view

Simple calculations yield


kKk∞kfX0 k∞ kgk2∞
VarWi,n ≤ ,
wnd
 
2 kKk∞
|Cov(Wi,n , Wj,n )| ≤ kfX0 k∞ kgk∞ + kfX0 k∞ ,
wnd
and we have
n−1
1X
Zn (g, b, x) − EZn (g, b, x) = Wi,n . (5.2.16)
n i=0
The property of {Xi }i being a stationary GSM-process is inherited by {Wi,n }i
whose kth α-mixing coefficient is less than α(k − 1), the (k − 1)st α-mixing
coefficient of {Xi}i (independent of b ∈ S, x ∈ IRd ). To bound the tails of
Zn − EZn , we exploit the expansion (5.2.16). We also use Theorem 1.3 in Bosq
(1996) which states a tail inequality for empirical means of centered random
variables in terms of their α-mixing coefficients:

Proposition 5.2.3. (Bosq, 1996, Theorem 1.3) Let {Yi}∞ i=−∞ be a centered
real-valued stochastic process with sup1≤i≤n kYik∞ ≤ D. Then for any q ∈
[1, n/2] and any  > 0

n−1
!  2   1/2  
1 X q 4D n
P Yi >  ≤ 4 exp − 2 + 22 1 + dqeα ,

n 8v  2q
i=0

where
n
p :=
2q
2 D
2
v := 2 σ(q)2 +
p 2
 
2
σ (q) := max E bjpc + 1 − jp Ybjpc+1 + Ybjpc+2 + ...
0≤j≤2dqe−1
  2
+Yb(j+1)pc + (j + 1)p − b(j + 1)pc Yb(j+1)p+1c .

The proposition will be applied to the centered GSM-process {Wi,n }i . Multiply-


ing out the sum defining σ 2 (q) we obtain at most (p+2)2 terms. Above variance
and covariance bounds for Wi,n then yield that for any q = qn ∈ [1, n/2]
2 2 const.
2
σ (q) ≤ ,
p wnd
5.2 Uniformly consistent regression estimation 143

where the constant depends on nothing but K, fX0 and C from (5.2.2)-(5.2.3).
Hence (set D := C · kKk∞wn−d )
 

P |Zn (g, b, x) − EZn (g, b, x)| > ≤ p(1) (2)
n () + pn ()
rn
with (const. being another suitable constant)

2 qn wnd
 
p(1)
n () := 4 exp − ,
const. · (1 + )rn2
 1/2   
(2) const. · rn n
pn () := 22 1 + dqn eα −1 .
wnd 2qn

5th step: We can now move on to finding appropriate wn, rn , νn in (5.2.15).


(1) (2)
The crux is to satisfy (5.2.11)-(5.2.14) with the above pn and pn . (5.2.11)
R∞
is fulfilled because of µ exp(−)d < ∞. Elementary calculations show that
(5.2.12) and (5.2.13) are satisfied if only

qn wnd 0
∈F and νnd+d ∈ F (5.2.17)
rn2

(observe that {an }, {bn} ∈ F and 0 ≤ ρ < 1 implies an ρbn & 0). As to (5.2.14)
we use
d
2kgk∞ kKk
Z ∞ rn /wn
0
(νn + 1)d+d p(2)
n ()d
µ
d
  2kgk∞ kKk
Z ∞ rn /wn 1/2
d+d 0 n const. · rn
≤ 22 · (νn + 1) · dqn eα −1 1+ d
2qn wnd
0
  
0 n rn
≤ const. · (νn + 1)d+d · dqn eα − 1 · d.
2qn wn
Thus (5.2.14) is satisfied if only
0
rn νnd+d
  
n
lim · dqn eα − 1 = 0. (5.2.18)
n→∞ wnd 2qn
So it suffices to satisfy (5.2.17) and (5.2.18). This is done by the choice
1
rn := (β > 1)
wn logβ n
144 Chapter 5. A Markov model with transaction costs: statistical view
diam(S) + diam(Xn )
νn :=
wnd+2
1
wn := 1/(d+2)
n
n
qn := (2β − 1 > a > 1).
loga n
Indeed, from qn wnd /rn2 = log2β−a n ∈ F and diam(Xn) ∈ F we have (5.2.17).
Moreover,
0
rnνnd+d
  
n
· dqn eα −1
wnd 2qn
1+d+d0 +(1+d)/(2+d)
 a  
d+d0 n log n
≤ const. · (diam(S) + diam(Xn)) · ·α −1 ,
loga+β n 2
and the GSM-property yields (5.2.18) (observe again that {an}, {bn} ∈ F and
0 ≤ ρ < 1 implies an ρbn & 0).
Finally, (5.2.15) now reads
1
lim sup sup rn · E sup |Zn (g, b, x) − φ(g, b, x)| ≤ lim sup const. · = 0. 2
n→∞ g∈G x,b n→∞ logβ n

The proof of Corollary 5.2.2 is more straightforward.


Proof of Corollary 5.2.2. Set X = supp fX0 and Xn0 := Xnγ , where γ > 0 is
adjusted later. We write supx,b instead of supx∈X ,b∈S . Clearly,

sup E sup |Rn (g, b, x) − R(g, b, x)|


g∈G x,b
≤ sup E sup |Rn (g, b, x) − R(g, b, x)|
g∈G x∈Xn0 ,b∈S
+ sup E sup ∗inf 0 |Rn (g, b, x) − Rn (g, b, x∗)|
g∈G x,b x ∈Xn
+ sup E sup ∗inf 0 |R(g, b, x) − R(g, b, x∗)|.
g∈G x,b x ∈Xn

Condition (5.2.7) implies

sup ∗inf 0 kx − x∗k∞ ≤ const. · n−kγ .


x∈X x ∈Xn

Using (5.2.2)-(5.2.5), we can bound the second and the third term from above
by
const. · sup ∗inf 0 kx − x∗ k∞ ≤ const. · n−kγ .
x∈X x ∈Xn
5.3 Proving the optimality of the strategy 145

Using Theorem 5.2.1, the first term satisfies

sup E |Rn (g, b, x) − R(g, b, x)|


sup
g∈G x∈Xn0 ,b∈S
   
Zn (g, b, x) Zn (g, b, x) φ(g, b, x)
= sup E sup Rn (g, b, x) −
+ −
g∈G x∈Xn0 ,b∈S fX0 (x) fX0 (x) fX0 (x)
supx∈Xn0 ,b∈S |Rn (g, b, x)|
≤ sup E · sup |fX0 (x) − Zn(1, b, x)|
g∈G inf x∈Xn0 fX0 (x) x∈Xn0 ,b∈S
1
+ sup E sup |Zn (g, b, x) − φ(g, b, x)|
inf x∈Xn0 fX0 (x) g∈G x∈Xn0 ,b∈S
logβ n
≤ const. · nγ .
n1/(d+2)
Consequently,

sup E sup |Rn (g, b, x) − R(g, b, x)|


g∈G x,b
! !
logβ n 1 logβ n
=O + =O ,
n1/(d+2)−γ nkγ nk/((d+2)(k+1))

the latter equality for the balanced choice γ = 1/((d + 2)(k + 1)). 2

5.3 Proving the optimality of the strategy


With the results of the previous section, we are in a position to give the
Proof of Theorem 5.1.1. Analyzing the strategy {b̂i }i , our first task is to
derive an inequality as (4.2.12) for ĥi instead of hi and b̂i instead of b∗i .
From the empirical Bellman equation, we find that

ĥi+1 (bi , X̄i+1 )


i
X  
= max f (b, X̄j+1 ) + (1 − δi+1 )ĥi+1 (b, X̄j+1 ) Ki+1 (X̄j , X̄i+1)
b∈S(bi,X̄i+1 )
j=−d+1
i
X  
≥ f (bi+1 , X̄j+1 ) + (1 − δi+1 )ĥi+1 (bi+1 , X̄j+1 ) Ki+1 (X̄j , X̄i+1 )
j=−d+1

for any admissible portfolio strategy {bi }i. Equality holds for {b̂i }i , which yields

m(b̂i+1, X̄i+1 ) − m(bi+1 , X̄i+1 ) (5.3.1)


146 Chapter 5. A Markov model with transaction costs: statistical view
(1) (2)
≥ Hi (bi+1 ) + Hi (bi+1 ) − ĥi+1 (bi , X̄i+1 ) + (1 − δi+1 )E[ĥi+1 (bi+1 , X̄i+2 )|Fi+1 ]
(1) (2)
−Hi (b̂i+1 ) − Hi (b̂i+1 ) + ĥi+1 (b̂i , X̄i+1 ) − (1 − δi+1 )E[ĥi+1(b̂i+1 , X̄i+2 )|Fi+1 ]

with
i
(1)
X
Hi (b) := f (b, X̄j+1 )Ki+1 (X̄j , X̄i+1 ) − m(b, X̄i+1),
j=−d+1

and
i
 X 
(2)
Hi (b) := (1−δi+1 ) ĥi+1 (b, X̄j+1)Ki+1 (X̄j , X̄i+1 )−E[ĥi+1 (b, X̄i+2 )|Fi+1 ] .
j=−d+1

(1) (2)
Now, we investigate the asymptotics of the terms Hi and Hi in (5.3.1).
Clearly,

X i
(1)
sup |Hi (b)| ≤ sup
f (b, X̄j+1 )Ki+1(X̄j , x̄) − m(b, x̄) . (5.3.2)
b∈S b∈S,x̄∈[A,B]dm j=−d+1
(2)
To analyse Hi we define a Fi -measurable random variable ci ,

ci := arg min kĥi − hi + ck∞ ,


c∈IR

and obtain
(2)
sup |Hi (b)|
b∈S
i
X
≤ sup
hi+1 (b, X̄j+1 )Ki+1 (X̄j , X̄i+1 ) − E[hi+1 (b, X̄i+2)|Fi+1 ]
b∈S
j=−d+1
i
X
+ (ĥi+1 (...) − hi+1 (...) + ci+1 )Ki+1 (...)
j=−d+1


−E[ĥi+1 (...) − hi+1 (...) + ci+1 |Fi+1 ]

i
X
≤ sup
hi+1 (b, X̄j+1 )Ki+1(X̄j , x̄)
b∈S,x̄∈[A,B]dm j=−d+1


−E[hi+1 (b, X̄i+2 )|X̄i+1 = x̄] + 2kĥi+1 − hi+1 + ci+1 k∞
i
X
≤ sup
hi+1 (b, X̄j+1 )Ki+1(X̄j , x̄)
dm
b∈S,x̄∈[A,B] j=−d+1
5.3 Proving the optimality of the strategy 147


−E[hi+1 (b, X̄i+2 )|X̄i+1 = x̄] + kĥi+1 − hi+1 k. (5.3.3)

For this, recall the norm k · k = 2 inf c∈IR k · +ck∞ on C(S × [A, B]dm ). By the
contraction property of M̂i ,

kĥi − hi k = kM̂i ĥi − Mi hi k


≤ kM̂i ĥi − M̂i hi k + kM̂i hi − Mi hi k
≤ (1 − δi )kĥi − hi k + 2kM̂i hi − Mi hi k∞ ,

and hence
2
kĥi − hi k ≤ kM̂i hi − Mi hi k∞ . (5.3.4)
δi
It is easily established that

kM̂i hi − Mi hi k∞ (5.3.5)

i−1
X
≤ sup
f (b, X̄j+1 )Ki (X̄j , x̄) − m(b, x̄)
b∈S,x̄∈[A,B]dm j=−d+1

i−1
X
+(1 − δi ) sup
hi (b, X̄j+1 )Ki(X̄j , x̄) − E[hi(b, X̄d+1 )|X̄d = x̄] .
b∈S,x̄∈[A,B]dm j=−d+1

As a consequence of (5.3.2)-(5.3.5) we obtain



(1) (2)
E sup Hi + Hi
b∈S

  i
2 X
≤ 1+ E sup f (b, X̄ j+1 )K i+1 ( X̄ j , x̄) − m(b, x̄)
δi+1 b∈S,x̄∈[A,B]dm
j=−d+1
  i
2 X
+ 1+ (1 − δi+1 ) E sup hi+1 (b, X̄j+1 )Ki+1 (X̄j , x̄)
δi+1
b∈S,x̄∈[A,B]dm j=−d+1


−E[hi+1 (b, X̄i+2 )|X̄i+1 = x̄]
i
const. X
≤ 2 sup E sup g(b, X̄j+1 )Ki+1 (X̄j , x̄)
δi+1 g∈G b∈S,x̄∈[A,B]dm
j=−d+1


−E[g(b, X̄i+2)|X̄i+1 = x̄] ,
148 Chapter 5. A Markov model with transaction costs: statistical view

and under the assumption (5.1.6) of the theorem we find that



(1) (2)
E sup Hi + Hi → 0 (i → ∞).
b∈S

Calculating expectations, summation n1 n−2


P
i=−1 ... and taking lim inf n→∞ in (5.3.1)
we end up with
n−1 n−1
! !
1X 1X
lim inf E m(b̂i , X̄i) − E m(b̃i , X̄i)
n→∞ n i=0 n i=0
 n−2
1 X 
≥ lim inf E ĥi+1 (b̃i+1 , X̄i+2 ) − ĥi+1 (b̃i, X̄i+1 )
n→∞ n i=−1
n−2
1 X 
−E ĥi+1 (b̂i+1 , X̄i+2 ) − ĥi+1 (b̂i, X̄i+1 )
n i=−1
n−1 
1X 
+E δi ĥi (b̂i , X̄i+1 ) − ĥi (b̃i , X̄i+1 ) (5.3.6)
n i=0
=: lim inf {Di},
n→∞

{b̃i }i being the periodic strategy from Lemma 4.2.6. This is the analogue of
(4.2.12) we were looking for.
Finally, we observe that

ĥi+1 (bi+1 , X̄i+2 ) − ĥi+1 (bi , X̄i+1 ) = hi+1 (bi+1 , X̄i+2 ) − hi+1 (bi, X̄i+1 )
 
+ ĥi+1 (bi+1 , X̄i+2 ) − hi+1 (bi+1 , X̄i+2 ) + ci+1
 
+ hi+1 (bi , X̄i+1 ) − ĥi+1 (bi , X̄i+1 ) − ci+1 , (5.3.7)

where the expectation of the absolute value of the sum of the last two brackets
is bounded from above by

2Ekĥi+1 − hi+1 + ci+1 k∞ = Ekĥi+1 − hi+1 k → 0 (i → ∞), (5.3.8)

using (5.3.4), (5.3.5) and the assumption (5.1.6) of the theorem. (5.3.7) and
(5.3.8) show that for the purpose of the asymptotical inference in (5.3.6), we
can replace ĥi+1 by hi+1 in the definition of Di . We can then argue exactly in
the same way as in the proof of Theorem 4.2.1 (starting from (4.2.12)) to obtain
the optimality relation
n−1 n−1
!
1X 1X
lim inf E m(b̂i, X̄i ) − E m(bi, X̄i ) ≥ 0. 2
n→∞ n i=0 n i=0
5.3 Proving the optimality of the strategy 149

It remains to prove Theorem 5.1.2:


Proof of Theorem 5.1.2. This will be done by application of Corollary 5.2.2
for the process {X̄i }i . Clearly, the GSM-property of {Xi }i makes {X̄i }i a GSM-
process, too.
Set G := {δi · hi , δi · f |i = 1, 2, ...}. Lemma 4.2.3 implies that kgk∞ ≤ kf k∞
for all g ∈ G. By assumption 3 of the theorem the requirements of Proposition
4.3.1 are met so that we can find a constant C > 0 with

|g(s, x̄) − g(t, x̄)| ≤ Cks − tk∞ (5.3.9)

for all g = δi · hi ∈ G, x̄ ∈ [A, B]dm and s, t ∈ S. Increasing C to at least


the Lipschitz constant of f , (5.3.9) holds for δi · f as well. Thus, the conditions
(5.2.2)-(5.2.4) of Theorem 5.2.1 are fulfilled for the class G. (5.2.5) holds because
of fX0 |X̄0 being Lipschitz continuous, and we conclude
i
1 X
2 sup E sup g(b, X̄j+1 )Ki+1 (X̄j , x̄)
δi+1 g∈G

b∈S,x̄∈[A,B]dm j=−d+1

1 log i
−E[g(b, X̄i+2 )|X̄i+1 = x̄] ≤ const. · 2 · α
δi+1 i

for some α > 0. By assumption 4 on {δi }i in Theorem 5.1.2, we find that


1 log i
2 · α −→ 0 (i → ∞),
δi+1 i

which finally yields the assertion. 2


150 Chapter 5. A Markov model with transaction costs: statistical view
151

CHAPTER 6

Portfolio selection functions in station-


ary return processes
In Chapter 5 we considered d-stage Markov processes in which portfolio selec-
tion could be done on the basis of the returns on the last d market days. By
the Markov property, the d-past completely characterized the stochastic regime
of the next market day. However, more general return processes {Xi }i , such as
merely stationary and ergodic processes will fail to have the Markov property.
Then, in principle, the investor is forced to evaluate the conditional log-optimal
portfolio b∗ (Xn , ..., X0) given the past returns X0 , ..., Xn starting from day zero.
As we will explain in Section 6.1, this is not always feasible. Choosing a port-
folio as a function of the d-past works well in Markov return processes. Hence
it is a natural modification of the conditional log-optimal strategy to consider
log-optimal portfolio selection functions fopt (·) : IRdm0 → S in stationary ergodic
return processes as well. These choose a portfolio from the portfolio simplex S
on the basis of the asset returns during the last d market periods (Section 6.1).
Log-optimal portfolio selection functions can only be calculated if one happens
to know the underlying return distribution. Otherwise the investor has to rely
on estimates.

Section 6.2 describes an estimator ZnL(·) for a log-optimal portfolio selection


function, with strong consistency results given in Lemma 6.2.1 and Theorem
6.2.2. The estimator works sequentially: the return data of the stocks is in-
cluded in the estimation process as it emerges. The central question is how
an investor using the estimated log-optimal portfolio selection function ZnL(·)
competes with other investors using different portfolio selection functions. As
we will see, repeated investment according to the estimate is optimal among all
investment strategies based on portfolio selection functions of the last d mar-
ket periods. In particular, it performs no worse than the unknown log-optimal
152 Chapter 6. Portfolio selection functions in stationary return processes

portfolio selection function fopt itself (Corollary 6.2.3).


In all this, L > 0 is a parameter of the underlying stock market characterizing
its regularity properties beyond stationarity and ergodicity. L is unknown to
the investor. To avoid this drawback in the cases relevant for practical applica-
tion, an adaptive estimator Zn (·) is constructed that does not require explicit
knowledge of L. This estimator features the same convergence properties as
ZnL (·) (Theorem 6.2.4), making it most appealing for application.
In Section 6.3 we prove the results of the preceding sections, and the chapter is
concluded by several simulations and examples (Section 6.4).

6.1 Portfolio selection functions


Let T ∈ IN be fixed and {Xi }∞ m
i=−T an [a, b] -valued (0 < a ≤ b < ∞), stationary
and ergodic process of return vectors in a market of m shares. As usual, at time
i, the return process up to and including Xi has been observed.
It is natural for the investor to choose his investment portfolio on the basis of
recently observed returns, say on the basis of the last d ∈ IN market periods
(d ≤ T fixed throughout). If investment performance is assessed on the basis
of logarithmic utility, the investor’s aim is to find a log-optimal portfolio
selection function of the d-past, i.e., a measurable function
m
X
b : [a, b]dm −→ S := {s ∈ IRm : sj = 1, sj ≥ 0}
j=1

such that (< ·, · > denoting the Euclidean scalar product)

E (log < b(X−d , ..., X−1), X0 >) ≥ E (log < f (X−d , ..., X−1), X0 >) (6.1.1)

for all measurable f : [a, b]dm −→ S. At time n ∈ IN0 , b advises the in-
vestor to allocate his wealth to the single shares according to the portfolio
b(Xn−d+1 , ..., Xn) = b(X̄n+1), where X̄n is a shorthand notation for the d-past
(Xn−d , ..., Xn−1) ∈ IRdm
+ of Xn .

In contrast to the conditional log-optimal portfolio b∗ , which is a function of


all past return data, we only include the last d observations. This brings about
several advantages:
6.1 Portfolio selection functions 153

– It is plausible that one should drop observations from the far-away past if
the stationarity of the market is not clear. Outdated observations (under
non-stationarity they appear to be drawn from a “wrong” distribution)
may endanger the performance of the conditional log-optimal portfolio.
We conjecture that portfolio selection functions are less sensitive to devi-
ations from stationarity of the return process. Finding empirical evidence
or counterevidence for this, however, is beyond the scope of this thesis.

– Each market has a specific log-optimal portfolio selection function, which


always remains the same in stationary markets. Log-optimal portfolio se-
lection functions are therefore much easier to interpret than conditional
log-optimal portfolios (which are not much of a single market character-
istic, but a sequence of functions). The shape of a log-optimal portfolio
selection function allows us to find structures in the stock quote chart
that should be interpreted as “buy”, “hold” and “sell” signals for the single
stocks. This makes log-optimal portfolio selection functions a theoreti-
cally well-founded counterpart of heuristic chart analysis (as presented,
e.g., in Möller, 1998).

– As already noted in Chapter 1, estimation of b∗ from market data is highly


problematic. As we shall see, estimation of log-optimal portfolio selection
functions, however, can exploit recent, powerful nonparametric regression
estimation algorithms in stationary ergodic processes.

In order to find a log-optimal portfolio selection function, we observe that


Z
 
E (log < b(X−d , ..., X−1), X0 >) = E log < b(x̄), X0 > |X̄0 = x̄ PX̄0 (dx̄).

Hence, it suffices to consider pointwise maximization of


 
R(s, x̄) := E log < s, X0 > |X̄0 = x̄

for fixed x̄. Here and in the sequel, the quantities s and x̄ are to be implicitly
understood as s ∈ S and x̄ ∈ IRdm+ .

Let KT (x̄) ⊆ S denote the set of Kuhn-Tucker-points (cf. Foulds, 1981), i.e.
the set of solutions of the convex maximization problem

R(s, x̄) −→ max! (6.1.2)


s∈S
154 Chapter 6. Portfolio selection functions in stationary return processes

Because of continuity of R(·, x̄) and because of S being a compact set, KT (x̄) 6=
∅, and the existence of solutions to (6.1.2) as well as to (6.1.1) is guaranteed.

R∗ (x̄) := max R(s, x̄)


s∈S

is the maximum of the target function and


Z
Rmax := R∗ (x̄)PX̄0 (dx̄) = max

E log < b(X̄0 ), X0 >
b:[a,b]dm →S

denotes the maximal expected logarithmic return.


To solve this maximization problem with historical return data (the return
distribution being unknown), Walk and Yakowitz (1999) and Walk (2000) have
suggested recursive estimation of log-optimal portfolio selection functions. For
this, we will use a nonparametric, strongly consistent regression estimation
scheme (i.e., with probability one, the estimates converge to the true regression
function in the pointwise sense). For a detailed overview of nonparametric
regression estimation (for non-i.i.d., i.e. dependent data), we refer the reader
to Bosq (1996), Härdle (1990) and Härdle et al. (1998).
Until recently, for non-i.i.d. data, only nonparametric regression estimators
were available for which strong consistency was linked up with appropriate
mixing conditions (Györfi, Härdle et al., 1989, and Bosq, 1996). These ensure a
suitable decay of dependency in the data. Others made regularity assumptions
on conditional densities of the process variables (Laib, 1999, Laib and Ould-
Said, 1996). On the other hand, the examples in Györfi et al. (1998) show
that we actually have to impose some regularity conditions on the process in
order to be able to obtain strong consistency. Correspondingly, the consistency
proofs for the estimated log-optimal portfolio selection functions in Walk and
Yakowitz (1999) and Walk (2000) also rely on mixing conditons.
However, mixing conditions can hardly be verified from observational data us-
ing some statistcal testing procedure. Now, with the work of Yakowitz, Györfi
et al. (1999) an algorithm has been proposed that achieves strong consistency
under other conditions than mixing. The mixing requirement, ensuring a suit-
able decay of dependency in the data, was replaced by a condition on the fi-
nite dimensional (more precisely the d-dimensional) distribution of the process,
namely a Lipschitz condition on the regression function. In particular, processes
featuring long-range dependence (which must be expected in financial data, see
6.2 Estimation of log-optimal portfolio selection functions 155

e.g. Ding et al., 1993; Peters, 1997) are not precluded from consideration as
they would have been under mixing conditions.
In this chapter, the estimator of Yakowitz, Györfi et al. (1999) is combined
with a stochastic projection algorithm (Kushner and Clark, 1978) to obtain
a strongly consistent sequential estimator for a log-optimal portfolio selection
function of the d-past of the return process. The mixing conditions in Walk
and Yakowitz (1999) and Walk (2000) are replaced by a Lipschitz condition on
the gradient of the target function.

6.2 Estimation of log-optimal portfolio selection


functions
Throughout this chapter we assume that the following regularity conditions V1
and V2 hold:

V1: {Xi }∞ m
i=−T (T ∈ IN) is an [a, b] -valued stationary ergodic stochastic process
on a probability space (Ω, A, P) (0 < a ≤ b < ∞ need not be known
explicitly). Some d ∈ IN (d ≤ T ) is fixed.

V2: The gradient of the target function R(s, x̄),


 
X0
m(s, x̄) := E X̄0 = x̄
< s, X0 >
(which we already know from the Kuhn-Tucker conditions in Theorem
1.3.3), is a Lipschitz continuous function of x̄ with Lipschitz constant

L/ md, i.e.
L
|m(s, x̄) − m(s, ȳ)| ≤ √ |x̄ − ȳ| for all x̄, ȳ ∈ IRdm
+ , s ∈ S.
md

Condition V2 is fulfilled if the conditional distribution PX0 |X̄0 has a density


fX0 |X̄0 (x0 , x̄) with respect to some measure µ, such that

La
|fX0 |X̄0 (x0, x̄) − fX0 |X̄0 (x0, ȳ)| ≤ √ |x̄ − ȳ|
m d · b µ([a, b]m)
(note the similarity to the Lipschitz conditions of Theorem 5.1.2). In particular,
this holds if fX0 |X̄0 is continuously differentiable. Hence, V2 is a condition on
156 Chapter 6. Portfolio selection functions in stationary return processes

the variability of the return vectors and as such a condition on the risk inherent
in the market.
At time n, the investor’s task is to produce an estimate ZnL (x̄) of the value of a
log-optimal portfolio selection function given the last d observed return vectors
are x̄ ∈ IRdm . This can be done by the following projection algorithm:

1. Before we start the estimation process, we fix a partition Pk of IRdm + into


cubes of volume (2 ) for each positive integer k. For x̄ ∈ IRdm
−k−2 dm
+ the
element of Pk containing x̄ is denoted by Ak (x̄). We also fix some sequence
αn > 0 (n ∈ IN).

2. Then, at time n, we calculate a partitioning regression estimate of the gra-


dient m(s, x̄) of the target function (Yakowitz, Györfi et al., 1999): More
precisely, for Nn ∈ IN with limn→∞ Nn = ∞ and X̄j := (Xj−d , ..., Xj−1)
we construct the gradient estimate by
Nn
ˆ k,n,L (s, x̄),
X
m̂n,L(s, x̄) = M̂1,n (s, x̄) + ∆
k=1

using
   
n n
Xj · 1Ak (x̄)(X̄j )
X  X
M̂k,n (s, x̄) :=    1Ak (x̄)(X̄j ) , (6.2.1)
< s, Xj >
j=−M j=−M
ˆ k,n,L (s, x̄) := TL2−k (M̂k,n (s, x̄) − M̂k−1,n (s, x̄)).

Here, M := T − d ∈ IN0 is the length of the training period of the algo-


rithm before the first estimate is produced. TL2−k denotes the truncation
operator, defined for z = (z1 , ...zm) ∈ IRm by TL2−k z = (w1 , ...wm) with
wi := sgn zi · min{|zi |, L2−k }.

3. Having obtained an estimate of the gradient of the target function, we ap-


ply the classical projection algorithm to estimate the maximum in (6.1.2)
(Kushner and Clark, 1978, Sec. 5.3, also used in different form by Walk
L
and Yakowitz, 1999): From the previous estimate Zn−1 (x̄) we calculate
an updated estimate by

ZnL (x̄) := Π Zn−1


L L

(x̄) + αn m̂n,L(Zn−1 (x̄), x̄) . (6.2.2)
6.2 Estimation of log-optimal portfolio selection functions 157

Here, for x ∈ IRm , Π(x) denotes the best approximating (in the Euclidean
norm) element of x in the simplex S, i.e. the projection of x on S. To
start the iteration at time n = 0, we use an arbitrary starting estimate
L
Z−1 (x̄) ∈ S.

Note that we assume L to be known. At a later stage the algorithm is modified


so as to comprise an adaptive choice of the market parameter L, which then
allows estimation without knowledge of the precise value of L.
The following lemma featuring the basic convergence properties of the estimate
will be crucial to the main results of this chapter. It shows that our estimates
approach the set KT (x̄) of Kuhn-Tucker points of (6.1.2), i.e., the collection of
values of log-optimal portfolio-selection functions at x̄.

Lemma 6.2.1. Let ρ ZnL(x̄), KT (x̄) := inf y∈KT (x̄) kZnL (x̄) − yk denote the


Euclidean distance of ZnL (x̄) from the set KT (x̄). Then, under the assumptions

1. V1 and V2,
P∞
2. αn −→ 0 (n → ∞) and n=0 αn = ∞,

we have that for PX̄0 -a.a. x̄:

lim ρ ZnL (x̄), KT (x̄) = 0 P − a.s..



n→∞

To formulate this result more neatly, let ZnL∗(x̄) denote the best approximating
(in the Euclidean metric) element of ZnL(x̄) in KT (x̄) (observe that KT (x̄) is
compact). Note that ZnL∗ (x̄) is a log-optimal portfolio selection function as x̄
varies. Then Lemma 6.2.1 can be rephrased more explicitly as

Theorem 6.2.2. Under the assumptions of Lemma 6.2.1 one has

1. Pointwise strong consistency for PX̄0 -a.a. x̄:

|ZnL (x̄) − ZnL∗(x̄)| −→ 0 (n → ∞) P − a.s.

and
R(ZnL (x̄), x̄) −→ R∗ (x̄) (n → ∞) P − a.s.. (6.2.3)
158 Chapter 6. Portfolio selection functions in stationary return processes

2. Strong L1 -consistency:
Z
|ZnL (x̄) − ZnL∗ (x̄)|PX̄0 (dx̄) −→ 0 (n → ∞) P − a.s.

and
Z
R(ZnL (x̄), x̄)PX̄0 (dx̄) −→ Rmax (n → ∞) P − a.s., (6.2.4)

hence also in Lr (P) for any r ∈ IN .

The limit relations (6.2.3) and (6.2.4) are the central results for the proposed
estimation procedure. They demonstrate that in the long run ZnL (x̄) almost
surely achieves the optimal expected growth of wealth among all strategies
based on portfolio selection functions of the d-past.
Remark concerning Lemma 6.2.1 and Theorem 6.2.2. The limit relations
in Lemma 6.2.1 and Theorem 6.2.2, part 1 are true even in the stronger sense
that a fixed exceptional null set of ω ∈ Ω and a fixed exceptional null set of
x̄ ∈ [a, b]dm exist, outside which for all ω and x̄

ρ ZnL(x̄), KT (x̄) −→ 0,


|ZnL (x̄) − ZnL∗(x̄)| −→ 0 and


R(ZnL (x̄), x̄) −→ R∗ (x̄)

as n → ∞ (cf. proof of Theorem 6.2.2).


Until now, we have merely considered the problem how to estimate a log-optimal
portfolio selection function. Of course this is not the primary task of the in-
vestor. He would like to actually use a log-optimal portfolio selection function
or – in case such a function is unknown – the estimates ZnL to rebalance his
investment porfolio. At time i ∈ IN0 , ZiL (x̄) is the most recent estimate of a
log-optimal portfolio selection function b(x̄) in (6.1.1). The investor therefore
takes ZiL(X̄i+1 ) = ZiL(Xi+1−d , ..., Xi) as the investment scheme to be used at
time i ∈ IN0 . The accumulated investment returns up to time n ∈ IN are
n−1
Y
Rn := < ZiL(X̄i+1 ), Xi+1 > .
i=0

The following corollary shows that, asymptotically, the investment strategy


ZiL (X̄i+1 ) is superior to any other strategy using a portfolio selection function
6.2 Estimation of log-optimal portfolio selection functions 159

of the last d market periods (pathwise competitive optimality).

Corollary 6.2.3. Suppose the support of the distribution PX0 is not confined
to a hyperplane in IRm containing the diagonal {(d, ..., d)T ∈ IRm |d ∈ IR}. For
any measurable portfolio selection function f : IRdm+ −→ S with accumulated
returns
n−1
Y
Vn := < f (X̄i+1 ), Xi+1 >
i=0

we have
1 Vn
lim sup log ≤ 0 P − a.s..
n→∞ n Rn

Lemma 6.2.1 and Theorem 6.2.2 are statements corresponding to Theorem 1 in


Walk and Yakowitz (1999), Corollary 6.2.3 is a generalisation of Corollary 1 in
Walk (2000). However, the statements are valid under fundamentally different
(in fact, considerably weaker) assumptions.
As already mentioned, in practical applications the exact value of L is not
disclosed to the investor. On the other hand, one can assume that, as the share
prices take on rational, i.e. countably many values only, so does the return
process {Xi}∞ i=−T as a process of price ratios. In this situation an adaptive
choice of L can be carried out in the following way:
Having fixed a sequence γn ∈ IN, γn −→ ∞, for the nth investment step a
random variable
n
1 X
Ln := arg max log < ZnK (X̄j ), Xj > (6.2.5)
K∈{1,...,γn } n + M + 1
j=−M

is defined, and the estimate of a log-optimal portfolio selection function b(x̄) is

Zn (x̄) := ZnLn (x̄).

For this procedure, we have

Theorem 6.2.4. Assume the distribution PX0 is supported on a denumerable


set and the support of the distribution is not confined to a hyperplane in IRm
containing the diagonal {(d, ..., d)T ∈ IRm |d ∈ IR}. Then Lemma 6.2.1, Theo-
160 Chapter 6. Portfolio selection functions in stationary return processes

rem 6.2.2 and Corollary 6.2.3 remain valid if ZnL is replaced by Zn .

For the regression estimator of Yakowitz, Györfi et al. (1999) it is not yet known
whether there exists an adaptive rule for the choice of the Lipschitz constant L
generating a procedure that is strongly consistent for arbitrary stationary er-
godic processes with Lipschitz continuous regression function. Theorem 6.2.4 is
remarkable because it asserts that for the application of the regression estimate
to the portfolio optimization problem such adaptation can be achieved.
We finish this section with two remarks about extensions of the stated results.
Remark concerning (6.2.1). Lemma 6.2.1, the first part of Theorem 6.2.2
and Theorem 6.2.4 still hold if we use kernel estimates
   
n    X n  
X X j X̄ j − x̄ X̄ j − x̄
M̂k,n (s, x̄) :=  K   K 
< s, Xj > hk hk
j=−M j=−M

instead of the partitioning estimates in (6.2.1). For this, we choose a continuous


kernel function K : IRdm + −→ IR+ having compact support, K(0) > 0, and
−k−2
bandwidths hk := 2 (Yakowitz, Györfi et al., 1999, Sec. 3). However, as we
shall see, the argument in the proof of the second part of Theorem 6.2.2 breaks
down unless the distribution of X̄0 is supported on a denumerable set.
Remark concerning (6.2.5). (6.2.5) can be seen as an application of the
principle of empirical risk minimization. By (6.2.2) random classes of functions
ZnL : [a, b]dm → S are defined, parametrized by admissible step widths α0 , ..., αn
and potential Lipschitz constants L. An estimator is picked out minimizing the
empirical risk, here the negative empirical mean return. This can also be used
to choose suitable step widths. In fact, the same arguments as used in the proof
(k,α ,...,α )
of Theorem 6.2.4 show: If Zn = Zn 0 n is an estimator constructed with
step widths α0 , ..., αn and Lipschitz parameter k such that
n
1 X
log < Zn(k,α0 ,...,αn )(X̄j ), Xj > (6.2.6)
n+M +1
j=−M
n
1 X
≥ log < Zn(L,1,1,1/2,...,1/n) (X̄j ), Xj >
n+M +1
j=−M

(for sufficiently large n), then Theorem 6.2.4 remains valid for this Zn .
6.3 Checking the properties of the estimation algorithm 161

6.3 Checking the properties of the estimation


algorithm
We now move on to the proof of the statements of the preceding section.

6.3.1 Proof of the convergence Lemma 6.2.1


The algorithm (6.2.2) is an application of a classical projection algorithm for
problem (6.1.2), using an estimate of the gradient of the target function,
 
∂   X0
m(s, x̄) = E log < s, X0 > |X̄0 = x̄ = E X̄0 = x̄ ,
∂s < s, X0 >
for s ∈ S. The gradient estimate is obtained via a partitioning method in
(6.2.1). For this reason, before we can turn to the proof of the crucial conver-
gence Lemma 6.2.1, we have to formulate some preliminary results on consis-
tency properties of the gradient estimate.
The data the statistician can access at time n ∈ IN0 ,
 n
(s) Xi
X̄i := (Xi−d , ..., Xi−1), Yi := ,
< s, Xi > i=−M
are drawn from a stationary and ergodic process. Indeed, referring to Stout
(1974), Theorem 3.5.8, as {Xi}∞ i=−d−M is stationary and ergodic, so is the
stochastic process {(Xi−d , ..., Xi)}∞ i=−M . This follows from the cited theorem,
n(d+1) n(d+1)
observing that < s, Xi > > 0 for all s ∈ S, so that fs : IR+ −→ IR+ :
(x1 , ..., x1+d) 7−→ (x1, ..., xd, x1+d / < s, x1+d >) is a well-defined measurable
mapping.
(s)
Moreover, Yi is bounded and Lipschitz continuous in s because of

qP
m 2
j=1 Xi,j √ b

Xi mb2
= Pm
< s, Xi > ≤ P m = m (6.3.1)
j=1 sj Xi,j a j=1 sj a

and (applying the Cauchy-Schwarz inequality)



Xi Xi

< s, Xi > < t, Xi >

1 1 | < t − s, Xi > |
≤ − · |Xi | = · |Xi |
< s, Xi > < t, Xi > | < s, Xi >< t, Xi > |
162 Chapter 6. Portfolio selection functions in stationary return processes
|Xi |2 mb2
≤ · |s − t| ≤ Pm Pm · |s − t|
| < s, Xi >< t, Xi > | j=1 sj Xi,j · j=1 tj Xi,j
mb2 mb2
≤ Pm Pm · |s − t| = 2 · |s − t| (s, t ∈ S). (6.3.2)
j=1 sj a · j=1 tj a a

Yakowitz, Györfi et al. (1999) propose a strongly consistent estimator for a


Lipschitz continuous regression function based on stationary ergodic data. The
(s)
gradient m(s, x̄) is the regression function of Yi on X̄i , so that we may use
this estimator to obtain a gradient estimate. To this end, let Pk be a partition
of IRdm
+ into cubes of volume (2
−k−2 dm
) . Ak (x̄) denotes the element of Pk in
dm
which x̄ ∈ IR+ comes to lie. Then m(s, x̄) is estimated by
Nn
ˆ k,n,L (s, x̄)
X
m̂n,L (s, x̄) := M̂1,n (s, x̄) + ∆
k=1

with
   
n  Xn
X Xj
M̂k,n (s, x̄) :=  1A (x̄) (X̄j )  1Ak (x̄)(X̄j ) (6.3.3)
< s, Xj > k
j=−M j=−M
ˆ k,n,L (s, x̄) := TL2−k (M̂k,n (s, x̄) − M̂k−1,n (s, x̄))
∆ (6.3.4)

and some fixed sequence Nn ∈ IN satisfying limn→∞ Nn = ∞.


The estimator is motivated by the telescopic expansion

X
m(s, x̄) = lim Mk (s, x̄) = M1 (s, x̄) + ∆k (s, x̄) (6.3.5)
k→∞
k=2

of the limit relation


 
X0
Mk (s, x̄) := E X̄0 ∈ Ak (x̄) −→ m(s, x̄)
< s, X0 >

for PX̄0 -a.a. x̄, where ∆k (s, x̄) := Mk (s, x̄) − Mk−1 (s, x̄) (Yakowitz, Györfi et
al., 1999, eq. (4)). In addition, a truncated version of the expansion is defined
by
X∞
mL(s, x̄) := M1 (s, x̄) + ∆k,L (s, x̄)
k=2

with ∆k,L (s, x̄) := TL2−k ∆k (s, x̄).


6.3 Checking the properties of the estimation algorithm 163

The convergence of the components (6.3.3) and (6.3.4) in the definition of the
estimator to the corresponding components in expansion (6.3.5) is given by the
following

Lemma 6.3.1. Under the assumption V1 one has for PX̄0 -a.a. x̄ and any fixed
s∈S  
1. M̂1,n (s, x̄) − M1 (s, x̄) −→ 0 (n → ∞) P − a.s.,
 
2. ˆ k,n,L (s, x̄) − ∆k,L (s, x̄) −→ 0 (n → ∞) P − a.s..

Proof. Straightforward application of the ergodic theorem (Stout, 1974, The-


orem 3.5.7, Yakowitz, Györfi et al., 1999, eq. (16)) yields

M̂k,n (s, x̄) −→ Mk (s, x̄) (n → ∞) P − a.s.,

in particular the first part of the lemma. Since the truncation operator is itself
Lipschitz continuous with Lipschitz constant 1, the second part of the lemma
is obtained by

ˆ
∆k,n,L (s, x̄) − ∆k,L (s, x̄)


≤ TL2−k (M̂k,n (s, x̄) − M̂k−1,n (s, x̄)) − TL2−k (Mk (s, x̄) − Mk−1 (s, x̄))


≤ M̂k,n (s, x̄) − Mk (s, x̄) + Mk−1 (s, x̄) − M̂k−1,n (s, x̄)

−→ 0 P − a.s. (n → ∞).
2
In Yakowitz, Györfi et al. (1999) a lemma analogous to the above one is used to
obtain pointwise strong consistency of m̂n,L(s, x̄). However, the proof of Lemma
6.2.1 requires convergence to hold uniformly in S, which will be derived from
the following lemma.

Lemma 6.3.2. Let S ⊆ IRd be some compact set, K > 0 and (fn )n∈IN a class
of functions fn : S −→ IRd with |fn (s) − fn (t)| ≤ K|s − t| for all s, t ∈ S.
Then limn→∞ fn (s) = 0 for all s ∈ S implies that limn→∞ sups∈S |fn (s)| = 0.

Proof. Let δ > 0 be arbitrary but fixed. Choose a finite δ-net N in S. Then
for all s ∈ S there exists some ts ∈ N with |s − ts | ≤ δ. This yields

|fn (s)| ≤ |fn (s) − fn (ts )| + |fn (ts)| ≤ K|s − ts| + |fn (ts )| ≤ K · δ + |fn (ts )|
164 Chapter 6. Portfolio selection functions in stationary return processes

and
sup |fn (s)| ≤ K · δ + sup |fn (t)|.
s∈S t∈N

Since N is finite, one has

lim sup sup |fn (s)| ≤ K · δ + lim sup sup |fn (t)| = K · δ + 0 = K · δ.
n→∞ s∈S n→∞ t∈N

The assertion follows from δ being arbitrary. 2


There are two consequences of Lemma 6.3.2 which we are going to need.
Consequence 1: For PX̄0 -a.a. x̄ we have

lim sup M̂1,n (s, x̄) − M1 (s, x̄) = 0 P − a.s.. (6.3.6)

n→∞ s∈S

This is a consequence of Lemma 6.3.2 with fn (s) := M̂1,n (s, x̄) − M1 (s, x̄).
According to Lemma 6.3.1 limn→∞ fn (s) = 0 P-a.s. and |fn (s) − fn (t)| ≤
|M̂1,n (s, x̄) − M̂1,n (t, x̄)| + |M1 (s, x̄) − M1 (t, x̄)| hold for any s ∈ S and for PX̄0 -
a.a. x̄. To bound the terms in the latter expression, one uses

|M̂k,n (s, x̄) − M̂k,n (t, x̄)|


 
n    n
X X j X j X
≤ − 1Ak (x̄) (X̄j )  1Ak (x̄)(X̄j )
j=−M < s, Xj > < t, Xj > j=−M
2

Xj Xj ≤ mb |s − t|

≤ max − (6.3.7)
−M≤j≤n < s, Xj > < t, Xj > a2

(the latter inequality from (6.3.2)) and

|Mk (s, x̄) − Mk (t, x̄)|


   2 
X0 X0 mb
≤E − X̄0 = x̄ ≤ E |s − t| X̄0 = x̄

< s, X0 > < t, X0 > a2
mb2
= 2 |s − t|. (6.3.8)
a
Altogether this yields

mb2 mb2 mb2


|fn (s) − fn (t)| ≤ |s − t| + |s − t| = 2 |s − t|
a2 a2 a2
for all s, t ∈ S, the situation as required in Lemma 6.3.2, and (6.3.6) is valid.
6.3 Checking the properties of the estimation algorithm 165

Consequence 2: Let R ∈ {2, 3, ...} be fixed. Then for PX̄0 -a.a. x̄



XR  
lim sup
ˆ k,n,L (s, x̄) − ∆k,L (s, x̄) = 0 P − a.s.
∆ (6.3.9)
n→∞ s∈S
k=2

holds.
PR  ˆ 
Here, put fn (s) := k=2 ∆k,n,L (s, x̄) − ∆ k,L (s, x̄) . For any fixed s ∈ S and
 
for PX̄0 -a.a. x̄ one has limn→∞ ∆ ˆ k,n,L (s, x̄) − ∆k,L (s, x̄) = 0 P-a.s. according
to Lemma 6.3.1 and hence limn→∞ fn (s) = 0 P-a.s. for all s ∈ S. Moreover,
R R
ˆ k,n,L (s, x̄) − ∆
ˆ k,n,L (t, x̄) +
X X
|fn (s) − fn (t)| ≤ ∆ ∆k,L (s, x̄) − ∆k,L (t, x̄) .
k=2 k=2

The first term can be bounded using



ˆ ˆ k,n,L (t, x̄)
∆k,n,L (s, x̄) − ∆

= TL2−k (M̂k,n (s, x̄) − M̂k−1,n (s, x̄)) − TL2−k (M̂k,n (t, x̄) − M̂k−1,n (t, x̄))


≤ M̂k,n (s, x̄) − M̂k,n (t, x̄) + M̂k−1,n (s, x̄) − M̂k−1,n (t, x̄)

mb2
≤ 2 |s − t|
a2
(with (6.3.7)), and we obtain
mb2
|fn (s) − fn (t)| ≤ 2(R − 1) |s − t|
a2
R
X R
X
+ |Mk (s, x̄) − Mk (t, x̄)| + |Mk−1 (s, x̄) − Mk−1 (t, x̄)|
k=2 k=2

for all s, t ∈ S. Appealing to inequality (6.3.8) this becomes


mb2
|fn (s) − fn (t)| ≤ 4(R − 1) |s − t|,
a2
and the requirements of Lemma 6.3.2 are met. The assertion (6.3.9) follows.
Finally we establish the desired strong consistency of the gradient estimate
m̂n,L(s, x̄) uniformly in S.

Lemma 6.3.3. Under the assumption V1

lim sup |m̂n,L (s, x̄) − m(s, x̄)| = 0 P − a.s.


n→∞ s∈S
166 Chapter 6. Portfolio selection functions in stationary return processes

holds for PX̄0 -a.a. x̄.

Proof. In view of (6.3.6) it suffices to show that for PX̄0 -a.a. x̄


N
X n ∞
ˆ k,n,L (s, x̄) −
X
lim sup ∆ ∆k,L (s, x̄) = 0 P − a.s..

n→∞ s∈S
k=2 k=2

Let R ∈ {2, 3, ...} be arbitrary. For sufficiently large n we have Nn > R and as
in Yakowitz, Györfi et al. (1999), proof of Theorem 1, we obtain
N
X n ∞
ˆ k,n,L (s, x̄) −
X
∆ ∆k,L (s, x̄)



k=2 k=2
R Nn ∞
X  
ˆ k,n,L (s, x̄) − ∆k,L (s, x̄) + ˆ k,n,L (s, x̄)| +
X X
≤ ∆ |∆ |∆k,L (s, x̄)|


k=2 k=R+1 k=R+1

The last two terms are bounded by


Nn ∞
ˆ k,n,L (s, x̄)| +
X X
|∆ |∆k,L (s, x̄)|
k=R+1 k=R+1
Nn
X ∞
X ∞
X
−k −k
≤ L·2 + L·2 ≤ 2L 2−k = 2−(R−1) · L.
k=R+1 k=R+1 k=R+1

Hence
N
X n X∞
sup
ˆ k,n,L (s, x̄) −
∆ ∆k,L (s, x̄)

s∈S
k=2 k=2

X R  
≤ sup
ˆ
∆k,n,L (s, x̄) − ∆k,L (s, x̄) + 2−(R−1) · L.

s∈S
k=2

Using (6.3.9) we derive


N
X n ∞
ˆ k,n,L (s, x̄) −
X
lim sup sup ∆ ∆k,L (s, x̄) ≤ 2−(R−1) · L

n→∞ s∈S
k=2 k=2

P-a.s., and the assertion follows letting R → ∞. 2


Having obtained these preliminary results, we can now move on to proving
Lemma 6.2.1. First, recall that the projection algorithm (6.2.2) is given by the
6.3 Checking the properties of the estimation algorithm 167

recurrence relation (n ∈ IN0)


L
arbitrary and ZnL(x̄) := Π Zn−1
L L

Z−1 (x̄) ∈ S (x̄) + αn m̂n,L(Zn−1 (x̄), x̄) ,
(6.3.10)
where
Nn
X
m̂n,L (s, x̄) = M̂1,n (s, x̄) + ˆ k,n,L (s, x̄).

k=1
Posing this as

ZnL(x̄) = Π Zn−1
L L

(x̄) + αn m(Zn−1 , x̄) + αn βn (x̄)

with
L L
βn (x̄) := m̂n,L(Zn−1 (x̄), x̄) − m(Zn−1 (x̄), x̄), (6.3.11)
the projection algorithm (6.3.10) is a special case of the projection algorithm

Wn = Π (Wn−1 + αn (m(Wn−1 ) + ξn + βn ))

in Kushner and Clark (1978, eq. 5.3.1) for ξn := 0. Their Theorem 5.3.1
adapted to the case ξn := 0 reads

Lemma 6.3.4. Under the assumptions

1. m(·, ·) is continuous,
P∞
2. αn > 0 with αn −→ 0 for n → ∞ and n=0 αn = ∞,

3. there exists a c ≥ 0 with |βn (x̄)| ≤ c < ∞ for all n ∈ IN0 ,

4. βn (x̄) −→ 0 P-a.s. for n → ∞

the projection algorithm converges,

lim ρ(ZnL (x̄), KT (x̄)) = 0 P − a.s..


n→∞

At last, this lemma allows us to give a


Proof of Lemma 6.2.1. The crucial point is to show that under the assump-
tions V1 and V2 and for PX̄0 -a.a. x̄ parts 3 and 4 of Lemma 6.3.4 hold when
we choose the βn defined by (6.3.11).
168 Chapter 6. Portfolio selection functions in stationary return processes

To this end, we first note that as in Yakowitz, Györfi et al. (1999, Corollary 1)
assumption V2, i.e., the existence of a constant L with
L
|m(s, x̄) − m(s, ȳ)| ≤ √ |x − y| for all x̄, ȳ ∈ IRdm
+ , s ∈ S,
md
implies
m(s, x̄) = mL(s, x̄).

Concerning part 3, one has


L L L L
|m̂n,L (Zn−1 , x̄) − m(Zn−1 , x̄)| = |m̂n,L (Zn−1 , x̄) − mL(Zn−1 , x̄)|
Nn
L L
X ˆ L L
≤ |M̂1,n (Zn−1 , x̄) − M1 (Zn−1 , x̄)| + ∆k,n,L (Zn−1 , x̄) − ∆k,L (Zn−1 , x̄)

k=2
∞ X ∞
L L
X ˆ L ∆k,L (Z L , x̄)

≤ |M̂1,n (Zn−1 , x̄)| + |M1 (Zn−1 , x̄)| + ∆k,n,L (Zn−1 , x̄) +

n−1
k=2 k=2

X
L
≤ |M̂1,n (Zn−1 L
, x̄)| + |M1 (Zn−1 , x̄)| + 2L 2−k .
k=2

Combining this with the inequalites


n  n 
L
X Xj X
|M̂1,n (Zn−1 , x̄)| =
1
L , X > A1 (x̄)
( X̄ )
j
1A1 (x̄) ( X̄ j )
j=−M
< Zn−1 j j=−M

Xj
≤ max L
,
−M≤j≤n < Zn−1 , Xj >
 
L
X0 X0
|M1 (Zn−1 , x̄)| = E
L
X̄0 ∈ A1 (x̄) ≤ sup
L

< Zn−1 , X0 > ω∈Ω < Zn−1 , X0 >

we get
L L
|m̂n,L(Zn−1 , x̄) − m(Zn−1 , x̄)|

X j X0 +2· L

≤ max
L , X >
+ sup

L
−M≤j≤n < Zn−1 j ω∈Ω < Zn−1 , X0 >
2
√ b
≤ 2 m + L =: c,
a
the latter inequality due to (6.3.1).
Condition 4. is readily verified from
L L
|m̂n,L(Zn−1 , x̄) − m(Zn−1 , x̄)| ≤ sup |m̂n,L(s, x̄) − mL(s, x̄)|
s∈S

and Lemma 6.3.3. This finishes the proof of Lemma 6.2.1. 2


6.3 Checking the properties of the estimation algorithm 169

6.3.2 Proof of the related Theorems 6.2.2 - 6.2.4


Proof of Theorem 6.2.2. For any fixed x̄, R(·, x̄) is a continuous function on
the compact set S, thus uniformly continuous. Part 1 of the theorem directly
follows from Lemma 6.2.1.
To prove part 2, observe that (6.3.3) involves only denumerably many functions
1Ak (x̄) (for fixed k and all possible values of x̄). Thus, the exceptional set of
ω ∈ Ω in Lemma 6.3.1 can be made independent from the chosen x̄. This
continues throughout the proof of Lemma 6.2.1. Hence as n → ∞ we have for
P ⊗ PX̄0 -a.a. (ω, x̄)

ρ ZnL (x̄), KT (x̄) −→ 0,




|ZnL (x̄) − ZnL∗(x̄)| −→ 0, (6.3.12)



R(ZnL (x̄), x̄) −→ R (x̄).

From part 1, the Lebesgue dominated convergence theorem yields the assertions
of the second part of the theorem. The limit relation (6.3.12), now valid P-a.s.
for PX̄0 -a.a. x̄, and

|ZnL (x̄) − ZnL∗ (x̄)| ≤ max |s − t| ≤ m
s,t∈S

imply Z
|ZnL (x̄) − ZnL∗(x̄)|PX̄0 (dx̄) −→ 0 (n → ∞)

P − a.s. and in Lr (P) (r ∈ IN). The same arguments, starting from

|R(ZnL (x̄), x̄) − R∗ (x̄)|


≤ |R(ZnL (x̄), x̄)| + |R∗ (x̄)| ≤ 2 max | log < s, y > | < ∞,
s∈S,y∈[a,b]m

yield the remaining parts of the proof. 2


Proof of Corollary 6.2.3. The assumption on the support of PX0 implies
that an essentially, i.e., PX̄0 -a.s. unique log-optimal portfolio selection function

w(x̄) := arg max E[log < s, X0 > |X̄0 = x̄]


s∈S

exists (Algoet and Cover, 1988, p. 877, corrected in Österreicher and Vajda,
1993, and Vajda and Österreicher, 1994). The accumulated return using the
170 Chapter 6. Portfolio selection functions in stationary return processes

log-optimal portfolio selection function for investment is denoted by


n−1
Y
Rn∗ := < w(X̄i+1 ), Xi+1 > .
i=0

It follows that
1 Vn 1 Vn 1 R∗
lim sup log ≤ lim sup log ∗ + lim sup log n . (6.3.13)
n→∞ n Rn n→∞ n Rn n→∞ n Rn
For the first term on the right hand side,the ergodic theorem and the optimality
of w imply
1 Vn
lim sup log ∗
n→∞ n Rn
n−1
1 X < f (X̄i+1 ), Xi+1 >
= lim sup log
n→∞ n i=0
< w(X̄i+1 ), Xi+1 >
 
< f (X̄0 ), X0 >
= E log
< w(X̄0 ), X0 >
Z

= E[log < f (x̄), X0 > |X̄0 = x̄] − E[log < w(x̄), X0 > |X̄0 = x̄] PX̄0 (dx̄)
≤ 0. (6.3.14)

Arguing along the lines of Walk (2000, Corollary 1), the second term has limiting
behaviour
1 R∗
lim sup log n
n→∞ n Rn
n−1
1X
log < w(X̄i+1), Xi+1 > − log < ZiL(X̄i+1 ), Xi+1 >

= lim sup
n→∞ n
i=0
= 0. (6.3.15)

This is seen as follows: (6.3.12) combined with Egorov’s theorem shows that
for each  > 0 we can find sets Ω̃ ⊆ Ω and I˜ ⊆ [a, b]dm such that P(Ω̃) ≥ 1 − ,
˜ ≥ 1 −  and
PX̄0 (I)

ZnL → w ˜
uniformly on Ω̃ × I. (6.3.16)

Then

n−1
1 X
L

log < w(X̄i+1), Xi+1 > − log < Zi (X̄i+1 ), Xi+1 >

n


i=0
6.3 Checking the properties of the estimation algorithm 171
n−1
1 X
log < w(X̄i+1 ), Xi+1 > − log < ZiL (X̄i+1 ), Xi+1 > · 1I˜(X̄i+1 )


n i=0
n−1
1 X
log < w(X̄i+1 ), Xi+1 > − log < ZiL (X̄i+1 ), Xi+1 > · 1I˜c (X̄i+1 )

+
n i=0
n−1 n−1
c X cX
w(X̄i+1 ) − ZiL(X̄i+1 ) · 1I˜(X̄i+1 ) +

≤ 1 ˜c (X̄i+1 )
n i=0 n i=0 I

for some constant c > 0. The first term tends to zero by (6.3.16), the second
term to c · PX̄0 (I˜C ) ≤ c · . Now, let  go to zero.
Finally, (6.3.14) and (6.3.15) plugged into (6.3.13) finish the proof. 2
Proof of Theorem 6.2.4. It suffices to prove Lemma 6.2.1 for Zn instead of
ZnL . Without loss of generality one can set M = 0. In the following we assume
n to be sufficiently large such that γn ≥ L. Because X̄0 takes on values in
a denumerable set X , for any  > 0 there exists a finite subset X̄ ⊆ X with
P(X̄0 ∈ X̄ C ) ≤ . As in the proof of Corollary 6.2.3, w(·) denotes the essentially
unique log-optimal portfolio selection function.
For ω ∈ Ω, x̄ ∈ X , we consider the sequence

ZnLn (x̄) = ZnLn (x̄, ω) ∈ S

and show that for PX̄0 -a.a. x̄ a P-a.s. limit relation ZnLn (x̄, ω) −→ w(x̄) holds.
To establish this, we work through the following: Consider an accumulation
point fω (x̄) of the sequence, say
L
Zn0n0 (x̄, ω) −→ fω (x̄)

for a subsequence n0 , and show that P-a.s.

fω (x̄) = w(x̄) for PX̄0 − a.a. x̄. (6.3.17)

Indeed, (6.3.17) implies the existence of a set A ∈ A, P(A) = 1, such that for
any ω ∈ A
ZnLn (x̄, ω) −→ w(x̄) (6.3.18)
for PX̄0 -a.a. x̄. This is seen as follows: Assume we had an x̄ with PX̄0 ({x̄}) > 0
and P(A(x̄)) > 0, where A(x̄) := {ω|ZnLn (x̄, ω) 9 w(x̄)}. P(A(x̄)) > 0 implies
A(x̄) ∩ A 6= ∅, hence an ω ∈ A(x̄) ∩ A exists with ZnLn (x̄, ω) 9 w(x̄) on the one
172 Chapter 6. Portfolio selection functions in stationary return processes

hand (according to the construction of A(x̄)), and ZnLn (x̄, ω) → w(x̄) on the
other hand (according to (6.3.18) and PX̄0 ({x̄}) > 0). This is a contradiction.
Hence, for PX̄0 -a.a. x̄, we have P(A(x̄)) = 0, i.e.
ZnLn (x̄, ω) −→ w(x̄) P-a.s..

We now tackle the proof of (6.3.17).


L
For any x̄ ∈ X̄ the sequence Zn0n0 (x̄, ω) takes on values in the compact set S, i.e.
there exists an accumulation point fω (x̄). Since X̄ is finite, repeatedly taking
subsequences gives an index sequence n00 for which
L
Zn00n00 (x̄, ω) −→ fω (x̄) (6.3.19)
uniformly for all x̄ ∈ X¯ . n00 will be denoted by n again in the following.
For any fixed ω
1 X n  
log < ZnLn (X̄i, ω), Xi > −E log < fω (X̄0 ), X0 >

n + 1 i=0

1 X n
≤ log < ZnLn (X̄i , ω), Xi > 1X¯ (X̄i )

n + 1 i=0
 
−E log < fω (X̄0 ), X0 > 1X¯ (X̄0 )

n
1 X
+ log < ZnLn (X̄i, ω), Xi > 1X̄ C (X̄i )

n + 1 i=0
 
+E log < fω (X̄0 ), X0 > 1X̄ C (X̄0 )

n
1 X
≤ log < ZnLn (X̄i , ω), Xi > − log < fω (X̄i ), Xi > 1X¯ (X̄i )

n + 1 i=0
1 X n
+ log < fω (X̄i ), Xi > 1X̄ (X̄i )

n + 1 i=0
 
−E log < fω (X̄0 ), X0 > 1X¯ (X̄0 )

 1 X n 
+c · 1X¯C (X̄i ) + P(X̄0 ∈ X̄ C ) (6.3.20)
n + 1 i=0
with a constant c = c(d, m, a, b) ∈ IR+ .
Because of the uniform convergence in (6.3.19) the first term of (6.3.20) satisfies
(for n sufficiently large)
log < Z Ln (X̄i, ω), Xi > − log < fω (X̄i), Xi > 1X̄ (X̄i)

n
6.3 Checking the properties of the estimation algorithm 173

≤ c · ZnLn (X̄i, ω) − fω (X̄i ) 1X¯ (X̄i ) ≤ c · 



(6.3.21)

for all i = 0, ..., n. Without loss of generality we may use the same constant c
as above.
For the second term it will be shown at the end of this proof that in P-a.a. ω
1 X n
lim sup log < fω (X̄i ), Xi > 1X̄ (X̄i)

n→∞ n + 1 i=0
 
−E log < fω (X̄0 ), X0 > 1X¯ (X̄0 ) = 0. (6.3.22)

For the third term the ergodic theorem yields that P-a.s.
n
1 X
lim 1X¯C (X̄i ) = P(X̄0 ∈ X̄ C ) ≤ . (6.3.23)
n→∞ n + 1
i=0

(6.3.21) to (6.3.23) plugged into (6.3.20) yield



1 X n
Ln

lim sup log < Zn (X̄i , ω), Xi > −E log < fω (X̄0 ), X0 > ≤ 3 · c · ,

n→∞ n + 1
i=0

and from  being arbitrary it follows that


n
1 X
log < ZnLn (X̄i, ω), Xi >−→ E log < fω (X̄0 ), X0 >

(6.3.24)
n + 1 i=0

for P-a.a. ω.
On the other hand, using the definition of the random variable Ln , we obtain
P-a.s.
n
1 X
log < ZnLn (X̄i , ω), Xi > − log < w(X̄i), Xi >

n + 1 i=0
n
1 X
log < ZnL(X̄i , ω), Xi > − log < w(X̄i ), Xi > −→ 0, (6.3.25)


n + 1 i=0

where a limit relation analogous to (6.3.15) can be used.


Plugging (6.3.24) into the first line of (6.3.25) and observing
n
1 X 
log < w(X̄i ), Xi >−→ E log < w(X̄0), X0 >
n + 1 i=0
174 Chapter 6. Portfolio selection functions in stationary return processes

we obtain that (again for P-a.a. ω)


 
E log < fω (X̄0 ), X0 > − E log < w(X̄0 ), X0 > ≥ 0.

Because of the essential uniqueness of the optimum w(·), for P-a.a. ω we infer
(6.3.17), namely that fω (x̄) = w(x̄) PX̄0 − a.s..
So it only remains to demonstrate (6.3.22). To this end, let C := C [a, b]dm, S


be the space of continuous functions f : [a, b]dm −→ S, equipped with the supre-
mum norm supx̄∈[a,b]dm |f (x̄)|. For fω we can find a continuation f¯ω contained in
C, which coincides with fω on X̄ . Due to the separability of C (Megginson, 1998,
Sec. 1.12) a denumerable set G ⊆ C can be found, such that for any given  > 0
and any fω there exists a function gω ∈ G satisfying supx̄∈X̄ |fω (x̄) − gω (x̄)| ≤ .
For given f : [a, b]dm −→ S use the shorthand notation Hn(ω, f ) for
n
1 X 
log < f (X̄i ), Xi > 1X̄ (X̄i) − E log < f (X̄0 ), X0 > 1X̄ (X̄0) .
n + 1 i=0

Then

|Hn (ω, fω )| = |Hn (ω, gω ) + Hn (ω, fω ) − Hn (ω, gω )|


n
1 X
≤ |Hn (ω, gω )| + log < fω (X̄i ), Xi > − log < gω (X̄i ), Xi > 1X¯ (X̄i )
n + 1 i=0
 
+E log < fω (X̄0 ), X0 > − log < gω (X̄0 ), X0 > 1X̄ (X̄0)
≤ |Hn (ω, gω )| + 2 · c · sup |fω (x̄) − gω (x̄)|
x̄∈X̄
≤ |Hn (ω, gω )| + 2 · c · .

Because of  being arbitrary, it suffices to convince ourselves of

H(ω, gω ) := lim sup |Hn(ω, gω )| = 0


n→∞

for P-a.a. ω. As to this, observe that (because of G being denumerable) the set
[
{ω|∃g ∈ G H(ω, g) > 0} = {ω|H(ω, g) > 0}
g∈G

is measurable. Using the ergodic theorem, the left hand side is a countable
union of null sets, i.e. null set itself. Hence for P-a.a. ω

H(ω, g) = 0 for all g ∈ G

and in particular H(ω, gω ) = 0, which completes the proof. 2


6.4 Simulations and examples 175

6.4 Simulations and examples


We conclude this chapter by simulations and examples in which we apply the
estimated log-optimal portfolio selection functions of Section 6.2 to simulated
and real markets. Throughout we select portfolios on the basis of the last d = 5
observed return data.

Example 6.1: The market consists of a riskless bond with return of 2.6%
per market period and a share that follows a geometrical Brownian motion
(Luenberger, 1998, Sec. 11.7; Korn and Korn, 1999, Ch. 2) with a mean
return µ = 3% per market period and and a volatility σ = 15% per market
period. Investment starts after 5 market periods and ends after 50 market
periods. In this model, due to the independence of the share’s log-returns, the
log-optimal portfolio selection function coincides with the log-optimal portfolio,
which suggests to invest 67.86% of the current wealth in each market period into

1 3.000

0.9 3.061

0.8 3.100

0.7 3.116

0.6 3.109

0.5 3.081

0.4 3.030

0.3 2.956

0.2 2.861

0.1 2.742

0 2.600
0 5 10 15 20 25 30 35 40 45 50

6.1 a) Proportion of wealth invested in the share (left vertical axis),


expected portfolio log-return (right vertical axis, in %). Results
for the true log-optimal strategy (dashed) and the estimated log-
optimal strategy (solid).
176 Chapter 6. Portfolio selection functions in stationary return processes

0
0 5 10 15 20 25 30 35 40 45 50

6.1 b) Value of a $1 investment in the share (grey, solid) or in the


bond (grey, dashed), respectively. We compare the value of a $1
investment in the true log-optimal strategy (upper black curve) and
the value of a $1 investment in the estimated log-optimal strategy
(lower black curve).
Figure 6.1: Sample path of an investment in a share following a geometrical
brownian motion and a bond during 50 market periods.

the share (calculated by Cover’s algorithm, Theorem 1.3.2). Figures 6.1 and 6.2
show sample paths in the market together with estimation results. Throughout
this section we use the kernel variant of the projection algorithm (6.2.2) with a
cosine kernel K(x̄) = cos(min{kx̄kF /100, 1}) + 1 (the Frobenius norm kx̄kF of
x̄ = (xi,j )1≤i≤2,1≤j≤d being defined as the square root of the sum of the diagonal
elements of x̄T x̄) and L = 100.
Subgraphs a) of Figures 6.1 and 6.2 show the estimated log-optimal portfolio
weight for the share (solid line), i.e. the coordinate of ZnL(Xn+1−d , ..., Xn) that
corresponds to the share. The results can be compared with the true log-
optimal portfolio (dashed line) and the expected portfolio returns per market
period given on the right vertical axis (in %).
6.4 Simulations and examples 177

In subgraphs b) we follow the value of a $1 investment in the share (grey, solid)


or in the bond (grey, dashed), respectively. We compare the value of a $1
investment in the true log-optimal strategy (upper black curve) and the value
of a $1 investment in the strategy using the estimated log-optimal portfolio
weights (lower black curve). The results are convincing: As we expect from the
competitive optimality result (Corollary 6.2.3), the estimated strategy allows
to track the value evolution of the log-optimal strategy.

Example 6.2: In this example we run the projection algorithm for the estima-
tion of log-optimal portfolio selection functions on real market data from NYSE,
22/4/1998-6/7/1998 (daily closing price data from www.wallstreetcity.com).
We use the same stocks (YELL, JBHT, UNP) as in Example 2.2, Section 2.3.

Here, we do not know the true log-optimal portfolio selection function. As a


substitute reference strategy we use the strategy with the constant weights esti-
mated in Example 2.2, Section 2.3 (i.e., (0.523951, 0.476049) for (JBHT,YELL),

1 3.000

0.9 3.061

0.8 3.100

0.7 3.116

0.6 3.109

0.5 3.081

0.4 3.030

0.3 2.956

0.2 2.861

0.1 2.742

0 2.600
0 5 10 15 20 25 30 35 40 45 50

6.2 a) Proportion of wealth invested in the share (left vertical axis),


expected portfolio log-return (right vertical axis, in %). Results
for the true log-optimal strategy (dashed) and the estimated log-
optimal strategy (solid).
178 Chapter 6. Portfolio selection functions in stationary return processes

0
0 5 10 15 20 25 30 35 40 45 50

6.2 b) Value of a $1 investment in the share (grey, solid) or in the


bond (grey, dashed), respectively. We compare the value of a $1
investment in the true log-optimal strategy (upper black curve) and
the value of a $1 investment in the estimated log-optimal strategy
(lower black curve).
Figure 6.2: Sample path of an investment in a share following a geometrical
brownian motion and a bond during 50 market periods.

and (0.465490, 0.53451) for (UNP,YELL)). We then compare the value of a $1


investment in the strategy using the estimated log-optimal portfolio selection
function with the value of the reference strategy (Figure 6.3). Investment starts
on the 5th day of trading. The value of the estimated log-optimal portfolio
strategy virtually coincides with the value of what was believed to be the true
log-optimal strategy in Example 2.2, Section 2.3 (we therefore plotted the value
for the log-optimal portfolio selection function only). This suggests that in
either case we are close to the log-optimal portfolio selection strategy.
– One might be tempted to argue that compared with Section 2.3 not much has
been gained. This is not the case. On the contrary, in Section 2.3 we considered
a very restrictive model involving independent, log-normally distributed daily
6.4 Simulations and examples 179

1.2

1.15

1.1

1.05

0.95

0.9

0.85
0 5 10 15 20 25 30 35 40 45 50

6.3 a) JBHT (grey, solid) and YELL (grey, dashed).

1.2

1.15

1.1

1.05

0.95

0.9

0.85

0.8

0.75
0 5 10 15 20 25 30 35 40 45 50

6.3 b) UNP (grey, solid) and YELL (grey, dashed).

Figure 6.3: Value of a $1 investment in two single stocks (grey), and in the es-
timated log-optimal portfolio of the two (black, solid) at NYSE 22/4-6/7/1998.
180 L’Envoi

returns. We just sketched and, in fact, then skipped most of the huge effort that
should have been put into diagnostic testing of these assumptions in Section 2.3.
Much of this effort is superfluous here. Indeed, with the model of Chapter 6
we gained considerable flexibility with respect to the underlying market model,
assuming not much more than stationarity and ergodicity. Considering there
is no such thing as absolute certainty about what the true stochastic regime
in the market looks like, we come to appreciate nonparametric algorithms that
work well under very weak assumptions and hence may be applied in many real
markets.

L’ENVOI

Clearly, we were only able to cover a small subsample of the problems the
investor faces in real markets. In the course of this thesis we derived several al-
gorithms for these selected problems, and the reader might (and hopefully will)
find some of them helpful to decide on practical investment problems. Beyond
this algorithmics, we hope we have conveyed the key message of this thesis: The
insight that nonparametric statistical forecasting and estimation techniques are
a valuable tool in portfolio selection and, in fact, in all mathematical finance.
181

REFERENCES

P. Algoet (1992): Universal schemes for prediction, gambling and portfolio


selection, Ann. Probab., 20(2), 901-941.
P. Algoet (1994): The strong law of large numbers for sequential decisions
under uncertainty, IEEE Trans. Inform. Theory, 40(3), 609-634.
P. Algoet (1999): Universal schemes for learning the best nonlinear predic-
tor given the infinite past and side information, IEEE Trans. Inform.
Theory, 45(4), 1165-1185.
P. Algoet and Th. Cover (1988): Asymptotic optimality and asymptotic
equipartition properties of log-optimal investment, Ann. Probab., 16(2),
876-898.
C.D. Aliprantis and K.C. Border (1999): Infinite Dimensional Analysis,
Springer, Berlin.
H.Z. An, Z.G. Chen and E.J. Hannan (1982): Autocorrelation, autore-
gression and autoregressive approximation (with corrections), Ann. Stat.,
10(3), 926-936.
C. Atkinson, S.R. Pliska and P. Wilmott (1997): Portfolio manage-
ment with transaction costs, Proc. Roy. Soc. Lond., Ser. A 453, No.
1958, 551-562.
L. Bachelier (1900): Théorie de la spéculation, Ann. Sci. Ecole Norm.
Sup., 17, 21-86.
D.H. Bailey (1976): Sequential Schemes for Classifying and Predicting Er-
godic Processes, Ph.D. thesis, Stanford University.
A. Barron and Th. Cover (1988): A bound on the financial value of in-
formation, IEEE Trans. Inform. Theory, 34, 1097-1100.
J. Bather (2000): Decision Theory, Wiley, Chichester.
182 References

R. Bell and Th. Cover (1988): Game-theoretic optimal portfolios, Man-


agement Science, 34(6), 724-733.

D.P. Bertsekas (1976): Dynamic Programming and Stochastic Control,


Academic Press, New York.

D.P. Bertsekas and S.E. Shreve (1978): Stochastic Optimal Control:


The Discrete Time Case, Academic Press, New York.

T. Bielecki, D. Hernández-Hernández and S.R. Pliska (1999): Risk


sensitive control of finite state Markov chains in discrete time, with appli-
cations to portfolio management, Math. Meth. Oper. Res., 50, 167-188.

T. Bielecki and S.R. Pliska (1999): Risk–sensitive dynamic asset man-


agement, Appl. Math. Optim., 39, 337-360.

T. Bielecki and S.R. Pliska (2000): Risk sensitive asset management with
transaction costs, Finance Stochast., 4, 1-33.

A. Blum and A. Kalai (1999): Universal portfolios with and without


transaction costs, Mach. Learning, 35, 193-205.

R.V. Bobryk and L. Stettner (1999): Discrete time portfolio selection


with proportional transaction costs, Probab. Math. Statistics, 19(2),
235-248.

D. Bosq (1996): Nonparametric Statistics for Stochastic Processes, Springer,


New York.

A. Böttcher and S.M. Grudsky (1998): On the condition numbers of


large semidefinite Toeplitz matrices, Lin. Alg. Appl., 279, 285-301.

L. Breiman (1961): Optimal gambling systems for favourable games, in:


Fourth Berkeley Symposium on Mathematical Statistics and Probability,
Prague 1961, 65-78, University of California Press, Berkeley.

P.J. Brockwell and R.A. Davis (1991): Time Series: Theory and Meth-
ods. Springer, New York.

V.V. Buldygin and V.S. Doncenko (1977): Convergence to zero of Gaus-


sian sequences, Mat. Zametki, 21(4), 531-538, English translation: Math.
Notes Acad. Sci. USSR, 21(3-4), 296-300.
183

A. Cadenillas (2000): Consumption-investment problems with transaction


costs: Survey and open problems, Math. Meth. Oper. Res., 51, 43-68.

P. Caines (1988): Linear Stochastic Systems, Wiley, New York.

L. Carassus and E. Jouini (2000): A discrete time stochastic model for


investment with an application to the transaction cost case, J. Math.
Economics., 33, 57-80.

F.H. Clarke (1981): Generalized gradients of lipschitz functions, Adv. in


Math., 40, 52-67.

Th. Cover (1980): Competitive optimality of logarithmic investment, Math.


Oper. Res., 5(2), 161-166.

Th. Cover (1984): An algorithm for maximizing expected log investment


return, IEEE Trans. Inform. Theory, 30, 369-373.

Th. Cover (1991): Universal portfolios, Math. Finance, 1(1), 1-29.

Th. Cover and E. Ordentlich (1996): Universal portfolios with side in-
formation, IEEE Trans. Inform. Theory, 42(2), 348-363.

Th. Cover and J.A. Thomas (1991): Elements of Information Theory, Wi-
ley, New York.

J.C. Cox, S.A. Ross and M. Rubinstein (1979): Option pricing: a sim-
plified approach, Journ. Financial Economics, 7, 229-263.

M.H. Davis and A.R. Norman (1990): Portfolio selection with transac-
tion costs, Math. Oper. Res., 15(4), 676-713.

L.D. Davisson (1965): The prediction error of stationary Gaussian time se-
ries of unknown covariance, IEEE Trans. Inform. Theory, 11(4), 527-532.

L. Devroye, L. Györfi and G. Lugosi (1996): A Probabilistic Theory of


Pattern Recognition, Springer, New York.

Z. Ding, C.W. Granger and R.F. Engle (1993): A long memory prop-
erty of stock market and a new model, J. Empir. Finance, 1, 83-106.

I. Donowitz and M. El-Gamal (1997): Financial market structure and


the ergodicity of prices, Social Systems Research Institute (SSRI) working
paper, University of Wisconsin-Madison.
184 References

J.L. Doob (1953): Stochastic Processes, Wiley, New York.

J.L. Doob (1984): Classical Potential Theory and its Probabilistic Counter-
part, Springer, New York.

P. Doukhan (1994): Mixing, Springer, New York.

D. Duffie (1988): Security Markets - Stochastic Models, Academic Press,


San Diego.

W. Feller (1968): An Introduction to Probability Theory and Its Applica-


tions, vol. 1, Wiley, New York.

B.G. Fitzpatrick and W.H. Fleming (1991): Numerical methods for an


optimal investment-consumption model, Math. Meth. Oper. Res., 16(4),
823-841.

W.H. Fleming (1999): Controlled Markov processes and mathematical fi-


nance, in: Nonlinear analysis, differential equations and control, NATO
Sci. Ser. C Math. Phys. Sci., 528, p. 407-446, Kluwer, Dordrecht.

L.R. Foulds (1981): Optimization Techniques, Springer, New York.

J. Francis (1980): Investments - Analysis and Management, McGraw-Hill,


New York.

J. Franke, W. Härdle and C. Hafner (2001): Einführung in die Statis-


tik der Finanzmärkte, Springer, Berlin.

L. Gerencsér (1992): AR(∞) estimation and nonparametric stochastic com-


plexity, IEEE Trans. Inform. Theory, 38(6), 1768-1778.

L. Györfi, W. Härdle, P. Sarda and P. Vieu (1989): Nonparametric


Curve Estimation from Time Series, Springer, New York.

L. Györfi, M. Kohler, A. Krzyżak and H. Walk (2002): A Nonpara-


metric Theory of Regression, Springer, New York.

L. Györfi and G. Lugosi (2001): Strategies for sequential prediction of sta-


tionary time series, in “Modeling Uncertainty. An Examination of its
Theory, Methods and Applications,” Eds. M. Dror, P. L’Ecuyer and F.
Szidarovszky, Kluwer, Dordrecht, 225-248.
185

L. Györfi, G. Lugosi and G. Morvai (1999): A simple randomized algo-


rithm for sequential prediction of ergodic time series, IEEE Trans. Inform.
Theory, 45(7), 2642-2650.
L. Györfi, G. Morvai and S.J. Yakowitz (1998): Limits to consistent
on-line forecasting for ergodic time series, IEEE Trans. Inform. Theory,
44(2), 886-892.
L. Györfi, I. Páli and E. van der Meulen (1994): There is no univer-
sal source code for an infinite source alphabet, IEEE Trans. Inform.
Theory, 40(2), 267-271.

E.J. Hannan and M. Deistler (1988): The Statistical Theory of Linear


Systems, Wiley, New York.

W. Härdle (1990): Applied Nonparametric Regression, Cambridge UP, Cam-


bridge.

W. Härdle, G. Kerkyacharian, D. Picard and A. Tsybakov (1998):


Wavelets, Approximation, and Statistical Applications, Springer, New
York.

D. Helmbold, R.E. Schapire, Y. Singer and M.K. Warmuth(1998):


On-line portfolio selection using multiplicative updates, Math. Finance,
8(4), 325-347.

O. Hernández-Lerma and J.B. Lasserre (1996): Discrete-Time Markov


Control Processes, Springer, New York.

O. Hernández-Lerma and J.B. Lasserre (1999): Further Topics on Dis-


crete-Time Markov Control Processes, Springer, New York.
T. Hida and M. Hitsuda (1993): Gaussian Processes. Translations of Math.
Monographs, 10, AMS, Providence.

F. Hirzebruch and W. Scharlau (1996): Einführung in die Funktional-


analysis, Spektrum Akademischer Verlag, Heidelberg.

I.A. Ibragimov and Y.V. Linnik (1971): Independent and Stationary Se-
quences of Random Variables, Wolters-Noordhoff, Groningen.

I.A. Ibragimov and Y.A. Rozanov (1978): Gaussian Random Processes,


Springer, New York.
186 References

E. Isaacson and H.B. Keller (1994): Analysis of Numerical Methods,


Dover Pub., New York.

J. Kelly (1956): A new interpretation of information rate, Bell Sys. Tech.


Journal, 35, 917-926.

K. Knopp (1956): Infinite Sequences and Series, Dover Publ., New York.

R. Korn and E. Korn (1999): Optionsbewertung und Portfoliooptimierung,


Vieweg, Braunschweig.

H.J. Kushner and D.S. Clark (1978): Stochastic Approximation Meth-


ods for Constrained and Unconstrained Systems, Springer, New York.

N. Laib (1999): Uniform consistency of the partitioning estimate under er-


godic conditions, J. Austral. Math. Soc., Series A, 67, 1-14.

N. Laib and E. Ould-Said (1996): Estimation non paramétrique robuste


de la fonction de régression pour des observations ergodiques, C. R. Acad.
Sci. Paris, Série I, 322, 271-276.

H.A. Latané (1959): Criteria for choice among risky ventures, Journal of
Political Economy, 38, 145-155.

Yu.S. Ledyaev (1984): Theorems on an implicitly defined multivalued map-


ping, Sov. Math. Dokl., 29(3), 545-548.

E. Lehmann (1983): Theory of Point Estimation, Wiley, New York.

D.G. Luenberger (1998): Investment Science, Oxford UP, Oxford.

H.M. Markowitz (1959): Portfolio selection, Wiley, New York.

H.M. Markowitz (1976): Investment for the long run: new evidence for an
old rule, J. Finance, 31(5), 1273-1286.

G. Matheron (1975): Random Sets and Integral Geometry, Wiley, New York.

J.H. McCulloch (1996): Financial applications of stable distributions, in


“Statistical Methods in Finance” (Handbook of Statistics, 14), Elsevier
Science, Amsterdam.

R.E. Megginson (1998): An Introduction to Banach Space Theory, Springer,


New York.
187

S. Mittnik and S.T. Rachev (1993): Modeling asset returns with alterna-
tive stable distributions, Econometric Review, 12, 261-330.

I.S. Molchanov (1993): Limit Theorems for Unions of Random Closed Sets,
Lecture Notes in Mathematics 1561, Springer, Berlin.

H.W. Möller (1998): Die Börsenformel, Campus, Frankfurt.

T. F. Móri (1982): Asymptotic properties of the empirical strategy in favou-


rable stochastic games, in “Limit theorems in probability and statistics”,
Colloq. Math. Soc. János Bolyai (Veszprém, 1982), 36, 777-790.

G. Morvai (1991): Empirical log-optimal portfolio selection, Problems of Con-


trol and Information Theory, 20(6), 453-463.

G. Morvai (1992): Portfolio choice based on the empirical distribution, Ky-


bernetika, 28(6), 484-493.

G. Morvai, S. Yakowitz and P. Algoet (1997): Weakly convergent non-


parametric forecasting of stationary time series, IEEE Trans. Inform.
Theory, 43(2), 483-498.

G. Morvai, S. Yakowitz and L. Györfi (1996): Nonparametric inference


for ergodic, stationary time series, Ann. Statist., 24(1), 370-379.

J. Neveu (1968): Processus Aléatoires Gaussiens, Université de Montréal.

J.A. Ohlson (1972): Optimal portfolio selection in a log-normal market when


the investorŠs utility-function is logarithmic, Stanford Graduate School
of Business, Research paper no. 117, Stanford University.

F. Österreicher and I. Vajda (1993): Existence, uniqueness and evalua-


tion of log-optimal investment portfolio, Kybernetica, 29(2), 105-120.

A. Pagan and A. Ullah (1999): Nonparametric Econometrics, Cambridge


UP, Cambridge.

E.E. Peters (1997): Fractal Market Analysis, Wiley, New York.

V. Petrov (1995): Limit Theorems of Probability Theory, Clarendon, Ox-


ford.

J. Rissanen (1989): Stochastic Complexity in Statistical Inquiry, World Sci-


entific, Teaneck, NJ.
188 References

S.M. Ross (1970): Applied Probability Models with Optimization Applica-


tions, Holden-Day, San Francisco.

B.Y. Ryabko (1988): Prediction of random sequences and universal coding,


Problems of Inform. Trans., 24 (Apr./June), 87-96.

P.A. Samuelson (1967): General proof that diversification pays, J. Fin. and
Quant. Anal., 2, 1-13.

P.A. Samuelson (1969): Lifetime portfolio selection by dynamic stochastic


programming, Rev. Economomics and Statistics, 51, 239-246.

P.A. Samuelson (1971): The “fallacy” of maximizing the geometric mean in


long sequences of investment or gambling, Proc. Nat. Acad. Sci. U.S.A.,
68, 2493-2496.

D.W. Scott (1992): Multivariate Density Estmation: Theory, Practice and


Visualisation, Wiley, New York.

S. Serra (1998): On the extreme eigenvalues of Hermitian (block) Toeplitz


matrices, Lin. Alg. Appl., 270, 109-129.

S. Serra Capizzano (1999): Extreme singular values and eigenvalues of non-


Hermitian block Toeplitz matrices, Journ. Comp. Appl. Math., 108,
113-130.

S. Serra Capizzano (2000): How bad can positive definite Toeplitz matri-
ces be? Numer. Funct. Anal. and Optimiz., 21(1-2), 255-261.

A.N. Shiryayev (1984): Probability, Springer, New York.

L. Stettner (1999): Risk sensitive portfolio optimization, Math. Meth. Oper.


Res., 50, 463-474.

W.F. Stout (1974): Almost Sure Convergence, Academic Press, New York.

G. Touissaint (1971): Note on optimal selection of independent binary-va-


lued features for pattern recognition, IEEE Trans. Inform. Theory, 17,
618.

I. Vajda and F. Österreicher (1994): Statistical analysis and applications


of log-optimal investments, Kybernetica, 30(3), 331-342.
189

H. Walk (2000): Cover’s algorithm modified for nonparametric estimation of


a log-optimal portfolio selection function, Preprint 2000-2, Math. Inst. A,
Univ. Stuttgart.
H. Walk and S. Yakowitz (1999): Iterative nonparametric estimation of
a log-optimal portfolio selection function, IEEE Trans. Inform. Theory
48(1), 324-333.
M. Wax (1988): Order selection for AR models by predictive least squares,
IEEE Trans. Acoust. Speech Sign. Process., 36(4), 581-588.
D. Williams (1991): Probability with Martingales, Cambridge UP, Cam-
bridge.
S. Yakowitz, L. Györfi, J. Kieffer and G. Morvai (1999): Strongly
consistent nonparametric forecasting and regression for stationary ergodic
sequences, Journal of Multivariate Analysis, 71, 24-41.
Z. Ye and J. Li (1999): Optimal portfolio with risk control, Chinese J. Ap-
plied Probability and Statistics, 15(2), 152-167.

You might also like