Professional Documents
Culture Documents
Vorgelegt von
Dominik Schäfer
aus Pforzheim
2002
dedicated to
My Parents
to whom I owe so much
CONTENTS
Abbreviations 3
Summary 7
Zusammenfassung 19
Acknowledgements 31
L’Envoi 180
References 181
3
ABBREVIATIONS
f (x)|x=y f evaluated in y
f + , f+ positive part of f , i.e., max{f, 0}
f − , f− negative part of f , i.e., max{−f, 0}
supp f support {x : f (x) > 0} of the function f
arg max f solution of a maximization problem (in some contexts
set-valued, i.e. {x : f (x) = supy f (y)}, in others a
measurably selected solution x with f (x) = supy f (y))
P probability measure
PX distribution of X
PY |X=x conditional distribution of Y given X = x
fX (·) a density of PX w.r.t. the Lebesgue measure
fY |X (·|x) a density of PY |X=x w.r.t. the Lebesgue measure
Q1 Q2 Q1 is absolutely continuous w.r.t. Q2
D(Q1 ||Q2 ) Kullback-Leibler distance of Q2 and Q1
a.s. P-almost surely, with probability one
P-a.a. P-almost all
E mathematical expectation
E[Y |X] conditional expectation of Y given X
E[Y |X = x] conditional expectation of Y given X = x
Var Variance
Cov Covariance
N (µ; Σ) normal distribution with mean µ and variance-
covariance matrix Σ
All non-standard notation is explained when it occurs for the first time. The
random variables in this thesis are understood to be defined on a common
probability space (Ω, A, P). IRd -valued random variables are implicitly assumed
to be measurable w.r.t. the Borelian σ-algebra B(IRd ). If not stated otherwise,
0
measurability of functions f : IRd → IRd means measurability w.r.t. B(IRd ) and
0
B(IRd ).
6
7
SUMMARY
Chapter 1
Introduction: investment and nonparametric statistics
Investment is the strategic allocation of resources, typically of monetary re-
sources, in an environment, typically a market of assets, whose future evolution
is uncertain. Investment problems arise in a huge variety of contexts beyond the
financial one. Resources may also take the form of energy, of data-processing
resources, etc. Strategic investment planning helps to run many processes with
higher benefit. In this thesis we focus our attention on financial investment,
which we think is the “prototypical” example of a resource allocation process.
The three ingredients of financial investment are the market, the actions the
investor may take and his investment goal (discussed in detail in Sections 1.1-
1.3):
– As to the market: We assume that there are m assets in our financial market.
The jth asset yields a return Xi,n on an investment of 1 unit of money
during market period n (lasting from “time” n − 1 to n, time being mea-
sured, e.g., in days of trading). The ensemble of returns on the nth day
of trading is given by
(Cover and Thomas, 1991, Theorem 15.5.2). Here, Wn∗ is the wealth at
time n resulting from a series of conditionally log-optimal investments,
Wn the wealth from any other non-anticipating portfolio strategy. We
argue that
The conditional log-optimal portfolio depends upon the distribution of the re-
turn process {Xn }n. Realistically, the true distribution of the market returns
and hence the log-optimal strategy is not known to the investor. This makes
statistics the natural partner of investment. Statistics is needed to solve the
key problem,
to find a non-anticipating portfolio selection scheme {b̂n }n (working with his-
torical return data only, without knowing the true return distribution) such
that for any stationary ergodic return process {Xn }n, the investor’s wealth
Ŵn := ni=1 < b̂i , Xi > grows –on the average– as fast as with the log-optimum
Q
Chapter 2
Portfolio benchmarking: rates and dimensionality
The performance of a portfolio selection rule is usually compared with that
of a benchmark portfolio selection rule. Our benchmark is the log-optimal
portfolio selection rule, and as we have seen in Chapter 1, this is the optimal
rule. An investor will typically find his own rule underperforming. He can only
10
Assuming that the return data arises from a process of independent and iden-
tically distributed (i.i.d.) random variables, it is important to know at what
R∗n
rate the underperformance E log R̂ vanishes for typical portfolio selection rules.
n
Using notions from information theory we prove a lower bound on this rate in
Section 2.1. Even in the simplest of all markets, a market with only finitely
many possible return outcomes,
There are empirical portfolio selection rules that achieve this rate. In particular,
the empirical log-optimal portfolio
n
1X
b̂n+1 := arg max log < b, Xi > (0.0.1)
b∈S n
i=1
any selection algorithm that assesses the single stocks seperately, e.g.
on the basis of single stock expected returns, is sure to pick the “bad”
stocks in some realistic market (Theorem 2.2.1).
Chapter 3
Predicted stock returns and portfolio selection
Having gained the insight that variance-covariance information about the mar-
ket (inter-stock correlations as well as temporal correlations) are integral to
successful investment decisions, we move on to particular investment strate-
gies. In Section 3.1 we consider a strategy which is particularly popular among
investors.
The strategy works in two steps, with the past logarithmic returns Yn, Yn−1 , ..., Y0
(Yi := log Xi ) as input data for the investment decision at time n:
exp(Ŷn+1) ≥ r.
We will call this strategy a “greedy strategy”, because it tries to single out
the best possible stocks only. As we shall see, this provides us with a natural
strategy which can be applied in markets with low log-return variance (Section
3.1).
The major problem in implementing the greedy strategy is the fact that the
forecasts Ŷn+1 can only be calculated if the distribution of the return process
is known to the investor. Hence, we need to derive an estimate Ê(Yn, ..., Y0)
for the conditional expectation Ŷn+1 = E[Yn+1 |Yn, Yn−1 , ...] from the market
observations Yn , ..., Y0. It is known from literature that no such forecaster can
be strongly consistent in the sense of
lim Ê(Yn, ..., Y0) − E[Yn+1 |Yn, Yn−1, ...] = 0
(0.0.2)
n→∞
with probability 1 for any stationary and ergodic process {Yn}n (Bailey, 1976).
This result is discouraging, but it does not rule out the existence of strongly
consistent forecasting rules for log-return processes as they arise in real financial
markets. In particular, Gaussian log-return processes have been proven to be
a good approximation for real log-return processes, but so far no answer has
been found to the question whether there exist forecasters that are strongly
consistent in any stationary and ergodic Gaussian process. In Section 3.2 we
prove that the answer is indeed affirmative. Under weak extra conditions on
the Wold coefficients of the process
we present a forecaster Ê(Yn, ..., Y0) for stationary and ergodic Gaus-
sian processes which satisfies the strong consistency relation (0.0.2)
and which is remarkably easy to compute (Lemma 3.2.1 and Corollary
3.2.3).
13
This results provides us with the necessary tools to implement the greedy strat-
egy in Gaussian log-return processes. However, the algorithm is of interest very
much in its own right, forecasting problems for Gaussian processes arising in
many areas.
Section 3.3 proves the convergence properties of the algorithm. Application
examples with simulated and real data in Section 3.4 are promising –when the
algorithm is run as a mere forecasting algorithm as well as when the algorithm
is run as a subroutine for the greedy strategy.
Chapter 4
A Markov model with transaction costs: probabilistic view
In simple markets where returns arise as i.i.d. data, the investor should invest
in a constant log-optimal portfolio strategy. This requires him not to change
the proportion of wealth held in each stock during the investment process. The
proportions remain constant, however, the prices of the assets change relatively
to each other during each market period, so that the actual quantities of the
single stocks in the portfolio vary from market period to market period. Thus,
a large number of transactions are needed to follow a constant log-optimal
strategy. In practice, this is a huge drawback: Much of the wealth accumulated
by a log-optimal strategy has to be spent to settle transaction costs such as
brokerage fees, administrative and telecommunication expenses. The conclusion
for the investor must be to adapt his strategy to meet two requirements: to
make as few costly transactions as possible, but to make as many as necessary
to boost his wealth. The aim of Chapters 4 and 5 is to investigate how these
two conflicting requirements can be balanced in one strategy.
To this end we shall assume that the returns arise from a d-stage Markov pro-
cess. In Chapter 4 the distribution of the return process is known, an unrealistic
assumption which we will drop in Chapter 5. Section 4.1 generalizes the mar-
ket model from Chapter 1 to include transaction costs proportional to the total
value of the purchased shares. Not surprisingly, the investor can only afford
a limited range of portfolio choices in presence of transaction costs, and as we
shall see,
in d-stage Markovian return processes it suffices to consider strategies
based on portfolio selection functions, i.e. portfolio selection schemes
of the form bi = c(bi−1 , Xi−d , ..., Xi−1) with an appropriate function c
(Definition 4.1.2).
14
Hence, the next portfolio is a function of the last portfolio and the last d ob-
served return vectors. The investor aims to maximize his expected mean loga-
rithmic return as before by choosing an optimal selection function c.
In Section 4.2 we tackle the problem how to obtain an optimal selection function
c – if the distribution of the return process were known. The main result
demonstrates that
Chapter 5
A Markov model with transaction costs: statistical view
The Bellman equation considered in Chapter 4 heavily depends upon the distri-
bution of the return process {Xn }n through a peculiar conditional expectation.
Hence, the results of Chapter 4 are valid only under the assumption that the in-
vestor knows the distribution of the stock return process. Of course, in practice
this is illusory. At best, the investor has an estimate of the return distribution
at his disposal. This, in turn, allows him to produce an estimate of the con-
ditional expectation in question and hence gives him an approximate Bellman
equation involving the observed empirical return data. Using nonparametric
regression estimation techniques
Chapter 6
Portfolio selection functions in stationary return processes
Considering the fact that the investor may have reason to believe that the his-
torical return data does not follow a d-stage Markov process in some cases,
we should move on to even more general market models than in the previous
chapters. Ignoring transaction costs, we consider a market whose returns are
merely stationary and ergodic. It is natural for the investor to take his invest-
ment decisions on the basis of recently observed returns, say on the basis of the
returns during the last d ∈ IN market periods (d fixed). This leads us to the
notion of log-optimal portfolio selection functions.
We make this more concrete in Section 6.1, where we take our familiar log-
utility approach again. The investor tries to find a log-optimal portfolio selection
function, i.e. a measurable function
b∗ : IRdm
+ −→ S
E (log < b∗ (X0 , ..., Xd−1), Xd >) ≥ E (log < f (X0 , ..., Xd−1), Xd >)
16
We require very mild conditions beyond stationarity and ergodicity. More pre-
cisely, we assume that the return process {Xn}∞ m
n=0 is an [a, b] -valued station-
ary and ergodic process (0 < a ≤ b < ∞ need not be known) and that a
Lipschitz condition on the conditional return ratio E[Xd / < s, Xd > |Xd−1 =
xd−1 , ..., X0 = x0] holds. The Lipschitz constant L is taken as a known market
constant.
Using a stochastic gradient algorithm and combining it with nonparametric
regression estimators,
Indeed, let Ŝn be the wealth accumulated during n market periods when on
the (i + 1)st day of trading the portfolio b̂i (Xi−d+1 , ..., Xi) is selected using the
most recent estimate b̂i of a log-optimal portfolio selection function. Then, if
Sn is the wealth accumulated during the same period using any other portfolio
selection function of the last d observed return vectors,
1 Sn
lim sup log ≤0
n→∞ n Ŝn
with probability 1 (Corollary 6.2.3).
After an appropriate modification, the algorithms and the results remain valid
even if the market constant L is unknown in real market applications (Theorem
6.2.4). Section 6.3 proves the findings, and the chapter is rounded off with
several realistic examples in Section 6.4.
Chapters 2, 3 and 6 can be read independently from each other, they are self-
contained. Chapters 4 and 5 are closely linked, however. Notation that goes
beyond common mathematical style is explained where it occurs for the first
time. We also refer the reader to the list of abbreviations at the beginning
of the thesis. The calculations and plots for the examples were generated us-
ing Matlab 4.0 and 6.0.0.88, Minitab 11.2 and R 1.1.1 with historical stock
quotes (daily closing prices) from the New York Stock Exchange provided by
www.wallstreetcity.com.
18
19
ZUSAMMENFASSUNG
Diese Arbeit soll ein Plädoyer sein für die Anwendung nichtparametrischer
statistischer Vorhersage- und Schätzmethoden auf Probleme, wie sie bei der
Planung von Finanzanlagen und Investitionen auftreten.
In sechs Kapiteln werden verschiedene Anwendungsmöglichkeiten nichtparamet-
rischer Techniken bei der Portfolioauswahl an Finanzmärkten analysiert. Dies
kann natürlich nur einen groben und zugegebenermaßen willkürlichen Aus-
schnitt aus diesem weiten Gebiet widerspiegeln –wir hoffen jedoch, dadurch die
Vorzüge nichtparametrischer Schätzmethoden bei der Portfolioauswahl aufzeigen
zu können.
Kapitel 1
Einführung: Investment und nichtparametrische Statistik
Investment ist der strategisch geplante Einsatz von Ressourcen (üblicherweise
von finanziellen Ressourcen) in einer Umgebung (üblicherweise in einem Fi-
nanzmarkt), deren zukünftige Entwicklung zufälligen Fluktuationen unterliegt.
Investitionsprobleme treten in einer Vielzahl von Gebieten auch über den fi-
nanziellen Kontext hinaus auf. Dabei können Ressourcen u. A. die Form
von Energie, von Datenverarbeitungskapazitäten, etc. annehmen. Die strate-
gische Planung von Investitionen hilft, viele Prozesse mit höherem Nutzen zu
betreiben. Diese Arbeit konzentriert sich auf finanzielle Investitionen, welche
gleichsam den “Prototyp” für verschiedenste Prozesse bilden, bei denen System-
ressourcen gewinnbringend einzusetzen sind.
Bei Investitionen finanzieller Natur spielen drei Komponenten eine Rolle: der
Markt, die Handlungsmöglichkeiten des Investors und sein Investitionsziel. Diese
Bausteine werden in den Abschnitten 1.1-1.3 im Detail diskutiert.
zusammengefasst. In den Augen des Investors ist {Xn }∞ n=1 ein stochasti-
scher Prozess, welcher in vielen realen Märkten stationär und ergodisch
ist (Definition 1.1.1). In manchen Kapiteln dieser Arbeit werden (re-
alistische) Zusatzannahmen über die Verteilung des Prozesses getroffen.
Entscheidend ist dabei jedoch,
einen möglichst großen Wert zu erzielen. Aus der Literatur ist bekannt,
dass dabei kein grundlegender Konflikt zwischen nahen (n endlich) und
fernen (n → ∞) Investitionshorizonten besteht. In beiden Fällen ist eine
Investition zum Zeitpunkt n gemäß dem bedingt log-optimalen Portfolio
(Cover and Thomas, 1991, Theorem 15.5.2). Wn∗ ist dabei das Vermö-
gen zum Zeitpunkt n, das der Investor durch eine Serie von bedingt log-
optimalen Investitionen erzielt, Wn das Vermögen mit einer beliebigen an-
deren Portfoliostrategie, die nicht über mehr Information verfügt als aus
vergangenen Marktbeobachtungen ableitbar (eine sogenannte “kausale”
Strategie).
Dies sollte für den Investor Grund genug sein, eine logarithmische
Nutzenfunktion zu verwenden, d.h. mit dem Wissen um die in der
Vergangenheit beobachteten Renditevektoren die Maximierung der er-
warteten zukünftigen logarithmierten Rendite zu betreiben.
Das bedingt log-optimale Portfolio leitet sich aus der Verteilung des Rendite-
prozesses {Xn }n ab. In der Realität ist die wahre Verteilung der Renditen und
damit auch die bedingt log-optimale Strategie dem Investor nicht bekannt. An
diesem Punkt bedarf die Finanzplanung der Statistik als Partner. Die Statistik
dient dem Investor zur Lösung des Problems,
eine Methode zu finden, die nur anhand historischer Renditedaten und ohne
Kenntnis der wahren Renditeverteilung eine optimale kausale Portfoliostrategie
{b̂n }n erzeugt. Optimalität wird hier in dem Sinn verwendet, dass die Strate-
gie für jeden stationären und ergodischen Renditeprozess {Xn}n das Vermö-
gen Ŵn := ni=1 < b̂i, Xi > des Investors im Mittel genauso schnell wach-
Q
sen lässt wie die log-optimalen Strategie {b∗n }n. Formal ausgedrückt soll {b̂n }n
garantieren, dass mit Wahrscheinlichkeit 1
1 W∗
lim sup log n ≤ 0.
n→∞ n Ŵn
22
Es ist bekannt, dass solche Methoden existieren (Algoet, 1992). Diese brin-
gen jedoch den Nachteil mit sich, höchst komplex zu sein und zur Erzeugung
praktisch verwertbarer Ergebnisse eine Unmenge historischer Daten zu benöti-
gen. Ein Ziel dieser Arbeit ist es, vereinfachte, aber effiziente Algorithmen zur
Portfolioauswahl zu entwickeln, die auf nichtparametrischen Vorhersage- und
Schätzverfahren basieren. Die Algorithmen sollen so gestaltet sein, dass sie für
möglichst große Klassen von Märkten anwendbar sind.
Kapitel 2
Der Vergleich von Portfolios: Konvergenzraten und Dimension
Die Güte einer Methode zur Portfolioauswahl wird in der Regel durch den
Vergleich mit einer Referenzstrategie beurteilt. Unsere Referenzstrategie ist
die log-optimale Portfolioauswahl, die –wie wir in Kapitel 1 gesehen haben–
eine optimale Verhaltensregel darstellt. Dem Investor wird es nicht gelin-
gen, letztere zu übertreffen. Natürlich wird er hoffen, dass der Mangel an
Leistungsfähigkeit seiner eigenen Strategie im Verlauf des Investitionsprozesses
verschwindet, wenn nämlich seine Schätzungen für die Verteilung des Ren-
diteprozesses mit zunehmender Menge verfügbarer historischer Daten immer
besser werden. Wählt der Investor zum Zeitpunkt n anhand der Beobachtun-
gen X1 , ..., Xn sein Portfolio, wird er in der nächsten Marktperiode eine Rendite
von R̂n =< b̂n+1, Xn+1 > erwirtschaften, während die log-optimale Strategie
Rn∗ =< b∗n+1 , Xn+1 > liefert. Der Vergleich beider Werte ermöglicht die Ein-
schätzung, um wieviel b̂n+1 der log-optimalen Strategie b∗n+1 unterlegen ist.
Zur Beurteilung der Qualität der Strategie b̂n+1 ist also insbesondere zu analysie-
R∗n
ren, mit welcher Geschwindigkeit E log R̂ gegen Null strebt. Dabei wird davon
n
ausgegangen, dass die Renditen in einem Prozess von unabhängigen, identisch
verteilten Zufallsvariablen auftreten. Unter Verwendung von Konzepten der
Informationstheorie wird in Abschnitt 2.1 eine untere Schranke für diese Kon-
vergenzgeschwindigkeit abgeleitet. Diese besagt, dass selbst im einfachsten aller
Märkte, einem Markt mit nur endlich vielen möglichen Renditekonstellationen
gilt:
23
Etwas leger ausgedrückt könnte man sagen, dass das empirisch log-optimale
Portfolio seine Defizite mit optimaler Geschwindigkeit wettzumachen vermag.
Die Ergebnisse gelten weitestgehend unabhängig von der Anzahl der Aktien
am betrachteten Markt. Dies ist untypisch für nichtparametrische Schätzver-
fahren und bedarf daher genauerer Diskussion (Theorem 2.1.4 zeigt, dass dieses
Phänomen auch in komplizierter gearteten Märkten auftritt).
Aus diesem Grund schließen wir in Abschnitt 2.2 eine detailliertere Diskus-
sion der Auswirkungen der Dimension des Marktes auf die Portfolioauswahl an.
Beschränkte rechnerische Kapazitäten werden den Investor bei seiner Investi-
tionsplanung dazu zwingen, sich auf eine kleinere Teilmenge aller Aktien am
Markt zu beschränken. Diese Teilmenge muss bereits in der Planungsphase,
also vor dem eigentlichen Investitionsprozess ausgewählt werden. Es werden
Kriterien für diese Vorauswahl benötigt. Üblicherweise würde man vorgehen,
indem man einzelne Aktien auswählt, deren Chart hohe Wachstumspotentiale
versprechen. Es wird gezeigt werden, dass dieser Weg mit substantiellen Un-
zulänglichkeiten behaftet ist:
Kapitel 3
Renditevorhersagen und Portfolioauswahl
Mit der Erkenntnis, dass erfolgreiche Portfolioauswahl Information über die
Varianz-Kovarianz-Struktur der Aktien am Markt bedarf (es spielen sowohl
zeitliche Korrelationen als auch Korrelationen zwischen den einzelnen Aktien
eine Rolle), wird in Abschnitt 3.1 eine Investmentstrategie vorgestellt, die sich
unter den Investoren großer Beliebtheit erfreut.
Die Strategie ist zweistufig und verwendet dabei die historischen log-Renditen
Yn ,Yn−1, ..., Y0 (Yi := log Xi ) als Eingangsdaten für die Investitionsentscheidung
zur Zeit n:
1. Erstelle eine Schätzung für die Zukunft des Marktes. Es wird gezeigt
werden, dass Vorhersagen für den Markt auf bedingten Erwartungen für
zukünftige log-Renditen bei gegebener Vergangenheit basieren sollten,
d.h. auf
Ŷn+1 := E[Yn+1|Yn , Yn−1, ...].
exp(Ŷn+1) ≥ r.
Wir nennen diese Strategie eine Strategie für den “gierigen Investor”, da sie da-
rauf ausgerichtet ist, nur die bestmöglichen Anlagemöglichkeiten herauszupicken.
Die Einfachheit der Strategie besticht, und in Märkten mit geringer Varianz der
log-Renditen führt sie zu sinnvollen Ergebnissen (Abschnitt 3.1).
Bei der Implementierung der Strategie sieht sich der Investor der Schwierigkeit
gegenüber, dass die Vorhersagewerte Ŷn+1 nur unter Kenntnis der wahren Vertei-
lung des Prozesses berechnet werden können. Daher wird man sich auf die
25
Berechnung einer Schätzung Ê(Yn , ..., Y0) für den bedingten Erwartungswert
E[Yn+1|Yn , Yn−1, ...] aus den Marktbeobachtungen Yn, ..., Y0 beschränken müssen.
Aus der Literatur ist bekannt, dass keine auf solche Weise gewonnene Schätzung
stark konsistent sein kann in dem Sinne, dass
lim Ê(Yn, ..., Y0) − E[Yn+1 |Yn, Yn−1, ...] = 0 (0.0.4)
n→∞
wird ein Vorhersagealgorithmus Ê(Yn, ..., Y0) für stationäre und er-
godische Gaußsche Prozesse entwickelt, der stark konsistent gemäß
(0.0.4) ist und der bemerkenswert einfach zu implementieren ist
(Corollary 3.2.3).
Diese Ergebnisse geben uns die Subroutinen an die Hand, um die Strategie für
den “gierigen” Investor in Gaußschen log-Renditeprozessen umzusetzen. Der
Algorithmus selbst ist jedoch auch unabhängig von seiner hier gegebenen An-
wendung von Interesse, treten Vorhersageprobleme für Gaußsche Prozesse doch
in einer Vielzahl von Gebieten auf.
Der Beweis der Konvergenzeigenschaften wird in Abschnitt 3.3 geführt. Anwen-
dungsbeispiele mit realen und simulierten Daten schließen sich in Abschnitt 3.4
an und zeigen vielversprechende Ergebnisse, wenn der Algorithmus zur reinen
Vorhersage, aber auch als Subroutine für die “gierige” Strategie dient.
Kapitel 4
Ein Markov-Modell mit Transaktionskosten: stochastische Aspekte
In den einfachsten Märkten, in denen die Renditen als unabhängige, iden-
tisch verteilte Zufallsvariablen auftreten, sollte in ein zeitlich konstantes log-
26
Das nächste zu wählende Portfolio ist somit eine Funktion des letzten gewählten
Portfolios und der letzten d am Markt beobachteten Renditevektoren. Wie
zuvor strebt der Investor danach, sein zu erwartendes logarithmiertes Vermö-
genswachstum zu maximieren, hier nun indem er eine optimale Portfolioaus-
wahlfunktion φ wählt.
Abschnitt 4.2 legt dar, wie eine optimale Auswahlfunktion c konstruiert werden
kann – alles unter der Prämisse, dass die wahre Verteilung der Renditen bekannt
wäre. Das Hauptresultat wird zeigen,
27
Die Bellman-Gleichung ist aus der Theorie der dynamischen Optimierung wohl-
bekannt, dennoch werden sich fundamentale Unterschiede zwischen klassischer
dynamischer Optimierung und dem Portfolioauswahl-Problem zeigen. Zur Vor-
bereitung auf Kapitel 5 werden in Abschnitt 4.3 schließlich weitere analytische
Eigenschaften der Lösung der Bellman-Gleichung abgeleitet.
Kapitel 5
Ein Markov-Modell mit Transaktionskosten: statistische Aspekte
Die Bellman-Gleichung, wie sie in Kapitel 4 aufgestellt wurde, hängt entschei-
dend von der Verteilung des Renditeprozesses {Xn }n ab. Diese Abhängigkeit
besteht in Form eines zu evaluierenden bedingten Erwartungswertes. Aus diesem
Grund sind die Ergebnisse von Kapitel 4 nur unter der Prämisse gültig, dass
der Investor die wahre Verteilung des Renditeprozesses kennt, was in der Praxis
natürlich illusorisch ist. Bestenfalls verfügt der Investor über eine Schätzung der
Verteilung der Renditen. Diese ermöglicht es ihm, eine Schätzung für bewussten
bedingten Erwartungswert zu berechnen, welche ihm dann eine Näherung der
Bellman-Gleichung liefert. Mit Hilfe von Techniken aus der nichtparametrischen
Regressionsschätzung
durch einen Kernschätzer Rn (g, b, x). In Abhängigkeit von der Glattheit einer
Dichte von X0 (wir nehmen an, dass eine solche existiert) wird die Konvergenz-
geschwindigkeit in der Limesrelation
Kapitel 6
Portfolioauswahlfunktionen in stationären Renditeprozessen
An realen Finanzmärkten beobachtet man unter Umständen eine Abweichung
des Renditeprozesses {Xn }n von einem d-stufigen Markov-Prozess. Deshalb
werden in diesem Kapitel noch allgemeinere Marktmodelle zu betrachten sein.
Transaktionskosten werden dabei ignoriert, dafür aber Renditeprozesse betrach-
tet, für die im Wesentlichen nur Stationarität und Ergodizität vorausgesetzt
wird. Für den Investor ist es naheliegend, seine Investitionsentscheidungen an-
hand der letzten d am Markt beobachteten Renditevektoren (d fest) zu treffen.
Dies führt zum Konzept von log-optimalen Portfolioauswahlfunktionen.
Dieses Konzept wird in Abschnitt 6.1 eingeführt. Der Investor verwendet wieder
eine logarithmische Nutzenfunktion und versucht daher, eine log-optimale Port-
folioauswahlfunktion zu ermitteln, d.h. eine messbare Funktion
b∗ : IRdm
+ −→ S,
E (log < b∗ (X0 , ..., Xd−1), Xd >) ≥ E (log < f (X0 , ..., Xd−1), Xd >)
29
Über Stationarität und Ergodizität hinaus werden dabei sehr milde Zusatzvo-
raussetzungen getroffen, konkret wird davon ausgegangen, dass der Renditepro-
zess {Xn }∞ m
n=0 ein [a, b] -wertiger stationärer und ergodischer stochastischer Pro-
zess ist (0 < a ≤ b < ∞ brauchen nicht bekannt zu sein) und dass eine
Lipschitzbedingung für den bedingten Renditequotienten E[Xd / < s, Xd >
|Xd−1 = xd−1 , ..., X0 = x0 ] gilt. Die Lipschitzkonstante L sei dabei eine bekan-
nte Marktkonstante.
Mit Hilfe eines stochastischen Gradientenverfahrens und Methoden der nicht-
parametrischen Regressionsschätzung wird gezeigt,
In der praktischen Anwendung spielt das folgende Resultat eine noch wichtigere
Rolle:
30
Sei Ŝn das Vermögen, das man nach n Marktperioden erzielt hat, wenn man
am (i + 1). Handelstag die aktuelle Schätzung b̂i verwendet, um das Portfo-
lio b̂i (Xi−d+1 , ..., Xi) zu wählen. Wenn Sn das Vermögen angibt, das man in
derselben Zeit mit einer beliebigen anderen Auswahlstrategie basierend auf den
jeweils letzten d beobachteten Renditen erwirtschaftet, so ist
1 Sn
lim sup log ≤0
n→∞ n Ŝn
mit Wahrscheinlichkeit 1 (Corollary 6.2.3).
Nach einer geeigneten Modifikation behalten die Algorithmen und die Resultate
ihre Gültigkeit, selbst wenn –wie in der Anwendungspraxis– die Marktkonstante
L dem Investor unbekannt ist (Theorem 6.2.4). Abschnitt 6.3 beweist die Resul-
tate, und das Kapitel wird mit mehreren realistischen Beispielen in Abschnitt
6.4 abgerundet.
Die Kapitel 2, 3 und 6 können unabhängig voneinander gelesen werden, sie sind
in sich abgeschlossen. Kapitel 4 und 5 sind jedoch eng verzahnt. Notationen,
die über die mathematische Standardnotation hinausgehen, werden bei ihrem
ersten Auftreten erklärt. Der Leser sei auch auf das Abkürzungsverzeichnis am
Anfang dieser Arbeit verwiesen. Berechnungen und Schaubilder für die Beispiele
wurden mit Matlab 4.0 und 6.0.0.88, Minitab 11.2 sowie R 1.1.1 erzeugt, wobei
die historischen Kursnotierungen (tägliche Schlusskurse) der New Yorker Börse
von www.wallstreetcity.com Verwendung fanden.
31
ACKNOWLEDGEMENTS
I am endebted to
Prof. Laszlo Györfi, for his hospitality during my visits to the Technical Uni-
versity of Budapest. On several occasions he gave me the right impulse
and really useful advice.
Prof. Volker Claus, for his interest in my work and for discussing the contents
of this thesis with me.
The DFG and the College of Graduates ”Parallel and Distributed Systems”, for
funding my research with everything that involves.
Dr. Jürgen Dippon, who never threw me out when I felt like discussing prob-
lems in mathematical statistics and finance.
Prof. Adam Krzyżak, for being my host during a stay at Concordia University,
Montréal, for many discussion about mathematical and other interesting
subjects.
32
33
CHAPTER 1
m
Definition 1.1.1. Let {Xn }∞
n=1 be an IR -valued stochastic process on a prob-
ability space (Ω, A, P).
1. {Xn }∞
n=1 is called stationary, if
Stationarity preserves the stochastic regime over time, ergodicity is the setting
where time averages along trajectories of the process converge almost surely to
expected values under the process distribution:
Theorem 1.1.2. (Birkhoff Ergodic Theorem, Stout, 1974, Sec. 3.5) Let
m
{Xn }∞n=1 be an IR -valued stationary and ergodic stochastic process on a prob-
ability space (Ω, A, P) with E|X1 | < ∞. Then
n
1X
Xi −→ EX1
n i=1
Stationarity and ergodicity are the basic assumptions for most statistical inves-
tigations. The stationarity of stock returns is a thoroughly investigated field,
36 Chapter 1. Introduction: investment and nonparametric statistics
both by economists (e.g. Francis, 1980, A24-1) and statisticians (e.g. Franke et
al., 2001, Sec. 10.6). It is natural to assume that there is short term stationar-
ity in most stock returns, some authors (Francis, 1980) even claim that return
data may be treated as stationary if the time horizon comprises at least one
complete business cycle. There is no conclusive answer that proves or disproves
stationarity for the majority of stock markets, and it seems as though this has
to be decided from case to case. We accept stationarity as a working hypothesis,
accounting for the fact that it is common practice to assess and compare the
performance of statistical methods in the stationary setting.
Not much is known about the ergodic properties of stock quotes or stock re-
turns, neither from the theoretical economist’s point of view, nor from empirical
studies. There are indications that the ergodic properties of a market depend
very much upon the flow of information in the market and on the microeconomic
price generation (Donowitz and El-Gamal, 1997). These are difficult to assess,
and so the typical approach has become to derive algorithms under ergodic
hypotheses and then let the success of the algorithm justify the hypotheses.
Throughout this thesis we consider nonparametric models for {Xn }n, i.e.
models that do not require a parametrized evolution equation (in contrast to
MA, AR, ARMA, ARIMA, ARCH and GARCH models, cf. Brockwell and
Davis, 1991, Franke et al., 2001). The nonparametric approach guarantees
highest flexibility in modelling, skipping model parameters which otherwise
require extensive diagnostic model testing. To be more precise, the following
models will be investigated in this thesis:
Each of these models has been found useful for describing asset return data in
real financial markets. Model 1 is the Cox-Ross-Rubinstein model (Cox et al.,
1979; Francis, 1980, A24-1 and A24-2; Luenberger, 1998, Ch. 11; Franke et al.,
2001, Ch. 7). Models 2 and 3 are models with log-normal returns (Francis, 1980,
A24-1; Luenberger, 1998, Ch. 11) which arise, e.g., from a discretisation of the
Black-Scholes model (Luenberger, 1998, Ch. 11; Korn and Korn, 1999, Kap.
II). In contrast to the classical Black-Scholes model we allow for autocorrelated
log-returns in Chapter 3 (i.e. Cov(log Xn , log Xn+k ) 6= 0 for some k > 0). In
practice, autocorrelation of the log-returns manifests itself for small time lags
k (Franke et al., 2001, Ch. 10) as well as large k (long range dependence, Ding
et al., 1993; Peters, 1997). Many studies have indicated that the logarithms
of stock returns slightly depart from a Gaussian distribution (e.g. by heavy
tails, Mittnik and Rachev, 1993; McCulloch, 1996; Franke et al., 2001, Ch. 10
and the references there). It is therefore advisable to drop the assumption of
log-normality of the stock returns wherever possible. This is done in models 4
and 5, model 4 capturing the autocorrelation of stock returns by the Markov
property.
We will assume that the asset returns correspond to one of the models 1-5.
However, we do not assume the exact form of the true return distribution to be
known to the investor (with the exception of Chapter 4). Hence, the investor has
to apply statistical estimation and forecasting techniques for strategy planning.
Clearly, nonparametric models require nonparametric statistical methods and
arguments are usually more involved than in the parametric setting (for an
introduction to nonparametric estimation as we will use it see Györfi et al.
1989, 2002). Unfortunately, nonparametric methods are not yet common in
econometrics and financial mathematics (Pagan and Ullah, 1999, and Franke
et al., 2001, being two of the few notable exceptions). In this thesis we aim to
demonstrate what powerful impetus nonparamtric statistical estimation may
give to investment strategy planning.
the investor. Throughout the investment process, the investor holds varying
portfolios of the m assets. Taking a discrete time trading point of view, we
assume that the investor is only allowed to rebalance his portfolio at the be-
ginning but not in the course of each market period. The portfolio held at the
beginning of market period n (i.e. from time n − 1 to n) can be given by the
quantities q1,n−1 , ..., qm,n−1 of the single assets owned by the investor (qi,n−1 < 0
corresponds to borrowed assets, so-called short positions). The investor then
enters the nth market period with a portfolio value of
m
X
+
Wn−1 := Pi,n−1 qi,n−1 .
i=1
+
Hence, if Wn−1 6= 0, the portfolio achieved a return of
m
Wn− X
+ = Xi,n bi,n (1.2.1)
Wn−1 i=1
with
Pi,n−1 qi,n−1
bi,n := Pm .
j=1 Pj,n−1 qj,n−1
Note that m
P
i=1 bi,n = 1, and we will find it more convenient to denote a portfolio
by the portfolio vector
rather than listing q1,n−1 , ..., qm,n−1 . If the investor is allowed to consume an
amount cn before changing his portfolio bn for bn+1 and entering market period
n + 1, then Wn+ is given by
(1.2.1) and (1.2.2) are the equations governing general discrete time investment.
Throughout this thesis we are concerned with an investor who neither consumes
nor deposits new money into his portfolio but reinvests his current portfolio
1.2 Portfolios and investment strategies 39
market time n
period n = end of nth day of trading
}
X1 Xn-1 Xn Xn+1 return
value in each market period. Hence, cn = 0 for all n and (1.2.1) and (1.2.2) boil
down to n
Y
Wn := Wn+ = Wn− = W0 < bi, Xi >, (1.2.3)
i=1
where Wn is the current wealth of the investor at time n. Moreover, the factor
< bn , Xn >:= bTn Xn can be interpreted as the portfolio return during the nth
market period.
Moreover, we assume that the investor never enters short positions, i.e. bi,n ≥ 0.
Then bi,n is the proportion of the current wealth Wn invested in asset i at time
n − 1. The portfolio vector bn chosen at time n is a member of the simplex
m
( )
X
T
S := (s1 , ..., sm) si = 1, si ≥ 0 .
i=1
The choice of bn depends on the information In which the investor can access at
time n and which he deems relevant. Thus bn = bn (In) where In typically com-
prises a number of past observed asset returns (a substring of X1 , ..., Xn−1), in
some cases additional side or insider information about the market and external
economic factors. For specific choices of In , this is the setting for Chapters 2, 3
and 6 (Figure 1.1).
In reality, the range of portfolio choices is further narrowed by the occurrence of
transaction costs. Each transaction in a real market (purchase, sale of assets)
generates costs (brokerage fees, commission, administrative and communication
expenses). The total amount of these fees is withdrawn from the investor’s
wealth. Thus, the range of portfolios the investor may choose is restricted to
40 Chapter 1. Introduction: investment and nonparametric statistics
It is widely accepted that for returns below 10% and high frequency data (e.g.
daily returns) the logarithmic approximation is convincing (Franke et al., 2001,
Sec. 10.1). This leads to the notion of log-optimal portfolios, i.e. portfo-
lios that maximize the expected logarithmic utility of the investors’s growth
of wealth. The log-optimal portfolio of a process {Xn }n of independent and
identically distributed (i.i.d.) returns Xn is defined as
Log-optimal portfolios have been suggested first by Kelly (1956), Latané (1959)
and Breiman (1961) as diversification strategy for investment in a speculative
1.3 Pleading for logarithmic utility 41
market given by a process {Xn}∞ n=1 of i.i.d. return vectors. Since then, nu-
merous investigations, notably by Cover (e.g. Cover, 1980, 1984; Cover and
Thomas, 1991) and Algoet (e.g. Algoet and Cover, 1988) have explored the
theoretical aspects of this strategy, establishing that investment in log-optimal
portfolios yields optimal asymptotic growth rates for the invested wealth. An
introduction, various results and sources of reference can be found in Cover and
Thomas (1991, Chapter 15). There, for stationary and ergodic return processes
{Xn }n , (1.3.1) is generalized by the conditional log-optimal portfolio (for
the nth investment step)
in stationary ergodic return processes (conditioning being void for n = 1). The
conditional log-optimal portfolio is the log-optimal portfolio under the condi-
tional distribution PXn |Xn−1 ,...,X1 and hence a random variable. The log-optimal
investment strategy b∗1 , b∗2, ... is a member of the class of non-anticipating
strategies, i.e. sequences of S-valued random variables b1 , b2, ... with the prop-
erty that each bn is measurable w.r.t. the σ-algebra generated by X1 , ..., Xn−1
(hence the strategy requires no more information than available at time n).
The technical aspects of conditional log-optimal portfolios (we will often drop
“conditional” for brevity) are well explored:
Existence and uniqueness of the log-optimal portfolio has been investigated in
Österreicher and Vajda (1993) and Vajda and Österreicher (1994), correcting a
wrong criterion used in Algoet and Cover (1988). The main result is
Theorem 1.3.1. (Vajda and Österreicher, 1994) Let X = (X1 , ..., Xm) be
a stock market return vector with distribution PX . Then there exists a log-
optimal portfolio b∗ ∈ S with |E log < b∗ , X > | < ∞ if and only if
X m
E log Xi < ∞.
i=1
A good algorithm for the calculation of a log-optimal portfolio from the (known)
distribution PX of the return vector X was given by Cover (1984).
42 Chapter 1. Introduction: investment and nonparametric statistics
This is closely linked with the following, the Kuhn-Tucker conditions for a log-
optimal portfolio.
Theorem 1.3.4. (Algoet and Cover, 1988) Assume the return vectors {Xi}∞ i=1
form a stationary and ergodic process. Let Sn∗ := ni=1 < b∗i , Xi > be the wealth
Q
– It proves that the log-optimal portfolio will do at least as well as any other
non-anticipating strategy to first order in the exponent of capital growth,
i.e. it guarantees Sn∗ = exp (nW + o(n)) with highest possible rate W .
From the first part of (1.3.2) Bell and Cover (1988) conclude that
The log-optimality criterion has not been undisputed, however. In his criticism,
Samuelson (1971, also discussed in Markowitz, 1976) considers a market with
i.i.d. returns X1 , X2 , ... and compares the expected wealth ESn∗ from a series of
log-optimal investments with the expected wealth ESn∗∗ from investment in the
fixed portfolio
b∗∗ := arg max E < b, X1 >
b∈S
Realistically, the true distribution of market returns and hence the log-optimal
strategy is not revealed to the investor. Then the key problem is (as Algoet,
1992, put it):
Find a non-anticipating portfolio selection scheme {b̂n}n (a so-called universal
portfolio selection scheme) such that for any stationary ergodic market process
{Xn }n , the compounded capital Ŝn := ni=1 < b̂i , Xi > will grow exponentially
Q
fast almost surely (i.e. with probability 1) with the same maximum rate as under
the log-optimum strategy {b∗n}n , that is, limn→∞ log Ŝn /n = limn→∞ log Sn∗ /n
almost surely.
To obtain a universal portfolio selection scheme, under weak conditions on the
market one may choose the log-optimal portfolio with respect to some appro-
priately consistent estimate of PXn |X1 ,...,Xn−1 in the nth investment step (more
precisely, distribution estimates that almost surely exhibit weak convergence to
the true distribution). This was demonstrated by Algoet (1992, Theorem 7). He
also provides an appropriate, yet complicated estimation scheme (Algoet, 1992,
Theorem 9). Instead, we can also use the more transparent scheme of Morvai et
al. (1996). Algoet points out that there are universal portfolio selection schemes
that do not require an explicit distribution estimation scheme as a subroutine
(Algoet, 1992, Sec. 4.3). But still, all existing algorithms seem to require an
enormous amount of past data, making their feasibility in practical situations
doubtful (as noted, e.g., in Yakowitz, Györfi et al., 1999). More practicable
results have been obtained in the case of independent, identically distributed
return vectors. For instance, Morvai (1991, 1992) and Österreicher and Vajda
1.3 Pleading for logarithmic utility 45
MARKET MODEL
• assets i=1, ..., m
• stationary, ergodic stochastic
INVESTMENT return process {Xn}naþ+m
ACTIONS INVESTMENT GOAL
investment
• portfolio process strategy
{bn}naS • log-utility: maximisation
• transaction costs (occasionally) of expected log-returns
• no consumption • good for both short term
• no short positions and long run investment
(1993) propose portfolio strategies which are based on selecting the log-optimal
portfolio with respect to the empirical distribution of the data (the so-called
empirical log-optimal portfolio, more on that in Chapter 2). Those esti-
mators can be computed with reasonable effort. Repeated investment following
their strategies asymptotically yields the optimal growth rate of wealth with
probability one. However, in merely stationary and ergodic return processes
they produce suboptimal results.
This thesis aims to provide simplified, yet efficient portfolio selection algorithms
if the log-returns follow a Gaussian process (Chapters 2 and 3), a Markov pro-
cess (Chapters 4 and 5) or, more general, a stationary and ergodic process
(Chapter 6). Our approach is summarized in Figures 1.1 and 1.2.
For the sake of completeness, it should be noted that in recent years, the log-
optimality criterion has been generalized in several ways. In particular, re-
searchers tried
and Stettner, 1999, for continuous time models; Bielecki, Hernández and
Pliska, 1999, for a discrete time model).
It is left for future research to generalize the results of this thesis to these
extended models.
47
CHAPTER 2
where the expectation is calculated for Y ∼ Q̂n . This choice yields the random
return R̂n = b̂n+1 Xn+1 during the next market period. In order to determine
how well b̂n reproduces the true log-optimal portfolio b∗ = arg maxb∈S E log bX1
with return Rn∗ = b∗ Xn+1 , we first observe that
E log Rn∗ − E log R̂n = E log b∗ Xn+1 − E E[log b̂n(X1 , ..., Xn)Xn+1 |X1 , ..., Xn]
∗ ∗
≥ E log b Xn+1 − E E[log b Xn+1 |X1 , ..., Xn]
= E log b∗ Xn+1 − E log b∗ Xn+1 = 0,
As will be seen in the proof, (2.1.1) is not needed when considering unbiased
estimators Q̂n, i.e., estimators for which EQ̂n(A) = Q(A) for all A ∈ B(IRm
+ ).
Proof. Consider a 2 stock market with return vector (X (1), X (2)) ∈ IR2+ and
portfolios (b, 1 − b), b ∈ [0, 1]. We can expand
E log(bX (1) + (1 − b)X (2)) = E log((Z − 1)b + 1) + E log X (2) (2.1.4)
with the return ratio Z := X (1)/X (2). Thus, in a 2 stock market, the log-
optimal portfolio only depends upon the distribution of the return ratio Z. For
simplicity, let Z be of the form
(
A with probability p,
Z= (2.1.5)
B with probability 1 − p
with p ∈ (0, 1), A, B > 0 to be chosen later.
We first consider the classical parameter estimation problem of estimating p,
which will be linked with the portfolio selection problem at a later stage. Q̂n
allows the investor to derive an estimate of p,
n o
p̂n = p̂n (z n) := Q̂n (x(1), x(2)) : x(1)/x(2) = A ,
z n ∈ {A, B}n being the observed realisations of the i.i.d. return ratios Z1 , ..., Zn
(independent of Z). If k(z n) denotes the number of A’s in z n and Bin (p) =
n i n−i
i p (1 − p) denotes the ith Bernstein polynomial of order n, we can identify
n −1 n
X n X X
fn (p) := Ep̂n (Z1 , ..., Zn) = p̂n (z n) Bin (p) =: bi,nBin (p)
i
i=0 n n
z :k(z )=i i=0
2.1 Rates of convergence in i.i.d. models 51
as a Bézier curve. For reasons to become clear later it is important to study its
derivative
n−1
X
0
fn (p) = n(bi+1,n − bi,n)Bin−1 (p).
i=0
Combinatorial arguments given at the end of this proof and relation (2.1.1)
yield
n|bi+1,n − bi,n| ≤ const. (2.1.6)
n−1
P n−1
independently of i and n. Using Bi (p) = 1 we obtain
i=0
The mean squared error M SE(p̂n) = E(p∗ − p̂n )2+ + E(p∗ − p̂n )2− satisfies
1
E(p∗ − p̂n )2+ ≥ M SE(p̂n ) (2.1.9)
2
or
1
E(p∗ − p̂n )2− ≥ M SE(p̂n ) (2.1.10)
2
for infinetly many n. In either case the Cramér-Rao lower bound yields (for
infinitely many n)
1 fn0 (p∗ )2
∗ 2 ∗ ∗ 2
E(p − p̂n )± ≥ + (fn (p ) − p )
2 In (p∗ )
52 Chapter 2. Portfolio benchmarking: rates and dimensionality
if (2.1.9) applies or
More precisely,
0
if p̂n > p∗ ,
p∗ −p̂n
b(p̂n) = p∗ (1−p∗ ) if p∗2 ≤ p̂n ≤ p∗ ,
if p̂n < p∗ 2
1
and
0 if p̂n > p∗ ,
Rn∗
E log X1 , ..., Xn = D(p ||p̂n ) if p∗ 2 ≤ p̂n ≤ p∗ ,
∗
R̂n D(p∗ ||p∗2 ) if p̂ < p∗2
n
This is what we wanted to show in case (2.1.9) holds. If (2.1.10) applies, we set
A := 2 − p∗ and B := 1 − p∗ and argue similarly.
It remains to prove (2.1.6). For our specific model we can assume
(
(A, 1) with probability p,
X=
(1, 1/B) with probability 1 − p.
If the observed return ratios z n and z 0n ∈ {A, B}n differ in one digit only, so do
the sequences of realisations x1 , ..., xn and x01 , ..., x0n of X that generate z n and
z 0n , respectively. Hence, using (2.1.1),
p̂n (z n) − p̂n(z 0n ) = lim Fn((A + , 1 + ), x1, ..., xn) − Fn((A, 1), x1, ..., xn)
→0+
0 0 0 0
− Fn ((A + , 1 + ), x1, ..., xn) − Fn ((A, 1), x1, ..., xn)
≤ lim+ Fn ((A + , 1 + ), x1, ..., xn) − Fn ((A + , 1 + ), x01, ..., x0n)
→0
0 0
+Fn ((A, 1), x1, ..., xn) − Fn ((A, 1), x1, ..., xn)
lim→0+ c((A + , 1 + ), x, y) c((A, 1), x, y) c
≤ max + =: .
(x,y)∈{(A,1),(1,1/B)} n n n
Let F(z n ) consist of all elements of {A, B}n which can be generated by changing
exactly one of the digits B in z n to A, and let G(z n) consist of all elements of
54 Chapter 2. Portfolio benchmarking: rates and dimensionality
−1
n X X
+(i + 1)−1 (p̂n (z n ) − p̂n (z 0n))
i+1
z 0n :k(z 0n )=i+1 z n ∈G(z 0n )
−1
n X X
= bi+1,n + (i + 1)−1 (p̂n (z n) − p̂n (z 0n )) .
i+1 n
0n 0n z :k(z )=i+1 z ∈G(z )
0n
The latter bracket {...} is an average with constituents bounded from above in
absolute value by c/n. Hence |bi+1,n − bi,n| ≤ c/n and the proof is finished. 2
Rn∗
2 1 1 4
E log X 1 , ..., X n ≥ E( − p̂n )2+ ≥ =
R̂n log 2 2 In (1/2) log 2 n log 2
Theorem 2.1.2. (Cover and Thomas, 1991) Let Q be the true return distribu-
tion and Q̂n a sequence of distribution estimates, both having densities q and
q̂n , respectively, w.r.t. some common dominating measure. Then
∆(Q̂n , Q) ≤ ED(Q||Q̂n)
Rn∗ b∗ x
Z
E log X 1 , ..., X n = log Q(dx)
R̂n b̂n x
A
∗
b x q̂n (x) q(x)
Z
= log · · Q(dx)
b̂nx q(x) q̂n (x)
A
56 Chapter 2. Portfolio benchmarking: rates and dimensionality
∗
b x q̂n (x)
Z
= log · Q(dx) + D(Q||Q̂n )
b̂nx q(x)
A
Z ∗
bx
≤ log Q̂n (dx) + D(Q||Q̂n ) ≤ D(Q||Q̂n ),
b̂n x
A
It is an interesting feature of inequality (2.1.12) that the rate itself does not
deteriorate when the number m of stocks in the market model grows. This is
2.1 Rates of convergence in i.i.d. models 57
with c(i) = O(1). To see how c(i) depends on m we allow ourselves some
heuristics: Móri (1982) proved that for a (slight) modification of the empiri-
cal log-optimal strategy, n(m−1)/2 Ŝn /Sn∗ converges to a non-degenerate random
variable Z,
Ŝn
n(m−1)/2 ∗ → Z in distribution.
Sn
This can be rewritten as
n
R∗
X m−1 1
· − log i → log Z in distribution.
i=1
2 i R̂i
Up to the logarithmic factor, the rate coincides with the classical rate n−1/2 of
stochastic parameter estimators – regardless of the portfolio dimension m.
Proof. For the proof we can assume m > 2: If m = 2 we artificially produce
a market with 3 assets from the original 2 stocks and a bond returning A/2
in each market period. In this setting, we never invest in the bond, i.e., log-
optimal investment is the same as in the original 2 stock market. A rate result
for the 3 stock market carries over to the 2 stock market.
First we make some preliminary observations on the covering of the simplex
S := {b ∈ IRm |bi ≥ 0, m
P
i=1 bi = 1}.
0
:= {b ∈ IR |bi ≥ 0, m
m 0
P
Let Sm i=1 bi ≤ 1} and define the mapping F : Sm−1
Pm−1
→
S : (x1, ..., xm−1) 7→ (x1, ..., xm−1, 1 − i=1 xi). Fix some > 0. Clearly, we can
0
cover Sm−1 ⊆ [0, 1]m−1 with N ≤ d1/δem−1 k · k∞ -balls of radius δ := /(m − 1)
0
centered at c(1) , ..., c(N ) ∈ Sm−1 . For any x ∈ S
(i)
inf
(x1 , ..., xm) − F (c )
i=1,...,N
∞
(
m−1 )
(i)
X (i)
= inf max
(x1, ..., xm−1) − c
, (cj − xj )
i=1,...,N
∞ j=1
(i)
≤ inf (m − 1)
(x1 , ..., xm−1) − c
≤ .
i=1,...,N ∞
It follows that S can be covered by at most d(m − 1)/em−1 k·k∞ -balls of radius
.
Let X1 , ..., Xn denote independent return data and augment this family by a
random variable X independent of X1 , ..., Xn with the same distribution as X1 .
Introduce the following abbreviations:
n n
1X 1X
Φn := max log bXi = log b̂n Xi ,
b∈S n i=1 n i=1
Ln := E[log b̂n X|X1 , ..., Xn]
and
Clearly,
Ln ≤ max E[log bX|X1 , ..., Xn] = max E log bX = L∗ .
b∈S b∈S
Under the condition − δ − 2c1 τ > 0, the Hoeffding inequality (Petrov, 1995,
2.6.2, note that | log bX| ≤ max{| log A|, | log B|} =: c2 < ∞) yields
N
n
!
X 1 X
K1 ≤ P E log b(j)X − log b(j) Xi ≥ − δ − 2c1 τ
j=1
n i=1
n( − δ − 2c1 τ )2
≤ 2N exp −
2c22
m−1
n( − δ − 2c1 τ )2
m−1
≤ 2 exp − . (2.1.13)
τ 2c22
Bounding K2:
n
1X
Φn − Ln = log b̂nXi − E[log b̂n X|X1, ..., Xn]
n i=1
≤ max log b(j) Xi − E log b(j) X + 2c1 τ,
j=1,...,N
60 Chapter 2. Portfolio benchmarking: rates and dimensionality
and under the condition δ − 2c1 τ > 0 we apply Hoeffding’s inequality again to
obtain
m−1
n(δ − 2c1 τ )2 n(δ − 2c1 τ )2
m−1
K2 ≤ N exp − ≤ exp − .
2c22 τ 2c22
(2.1.14)
Combining (2.1.13) and (2.1.14) for δ := /2, τ := /(8c1 ) yields
m−1
n2
∗ 8c1 (m − 1)
P(L − Ln > ) ≤ 3 exp − 2 .
32c2
This in turn implies
Z∞
∗
∆(Q̂n, Q) = E(L − Ln) = P(L∗ − Ln > ) d
0
Z∞ m−1
n2
8c1 (m − 1)
≤ an + 3 exp − 2 d
32c2
an
Z∞ m−1
1
exp −c3 na2n z dz
= an 1 + 24c1 (m − 1)
an z
(8c1 (m−1))−1
c logq n const.
const.
∆(Q̂n, Q) ≤ an 1 + q = + 1/2
c log n n1/2 n
for all n greater than some integer that depends on m, a, c3 and c. Hence,
n1/2
lim sup ∆(Q̂n, Q) ≤ c
n→∞ logq n
and from c > 0 being arbitrary, we infer
n1/2
lim sup ∆(Q̂n , Q) = 0,
n→∞ logq n
the assertion for the case m > 2. 2
1. M very much affects the scale of finite sample underperformance via the
constants in the rate results (recall what we infered from Móri, 1982).
For these reasons, the investor should work with a medium size range of stocks
at a time only. In other words, he will have to pre-select m < M stocks from
the whole market. These pre-selected stocks are the assets he includes in a
log-optimal portfolio. For illustrative purposes, we restrict ourselves to M = 3.
In this case, the investor may compose a log-optimal portfolio out of 6 possible
combinations of 1 or 2 stocks.
∗
V{n} := E log X (n)
A natural (and in fact a frequently used) way for pre-selection is to start with
a first “draught-horse” stock (say stock A) for our portfolio, i.e., a stock such
∗
that V{A} is large. From the two remaining contenders (say stocks B and C) the
∗
investor then includes the one with good single performance, e.g. B if V{B} >
∗ ∗ ∗
V{C} . The hope is to attain the optimum V{A,B} = max{n,m}⊆{A,B,C} V{n,m} .
The following result gives conditions under which this method is doomed to
failure in the realistic market model of log-normally distributed returns. More
∗
precisely, markets with log-normal returns are characterised for which V{1} <
∗ ∗ ∗ ∗
V{2} < V{3} and V{2,3} < V{1,2} at the same time, the two best single stocks
2.2 Dimensionality in portfolio selection 63
forming a poorer portfolio than the two worst stocks in the market. As a
consequence, in order to select the optimal 2 stock combination from the market,
the investor has to evaluate all M
m possible choices, a huge computational effort
in high dimensions – a effort though that cannot be avoided. This is in contrast
to the Markowitz mean-variance approach, where a portfolio built of stock 1
and 2 may be superior to a portfolio built of stock 2 and 3 in terms of risk
(i.e., variance of portfolio return), never though in terms of performance (i.e.,
expected portfolio return).
∗ ∗
µ1 < µ2 := 0 < µ3 and V{2,3} < V{1,2}
simultaneously.
by balancing stocks 1 and 2 rather than stocks 1 and 3, even though stock 2
might have poorer single performance than stock 3.
Corresponding results for dimension reduction in pattern recognition have been
obtained by Touissaint (1971, also discussed in Devroye, Györfi and Lugosi,
1996, Theorem 32.2).
For the proof of the theorem, we need a number of preliminary observations.
First consider a 2 stock market with log-normally distributed returns X (i),
and ! ! !!
Y (1) µ1 σ12 σ12
∼N ; .
Y (2) µ2 σ12 σ22
Log-optimal investment in log-normal markets has been considered, e.g., by
Ohlson, (1972). As noted in Chapter 1, the log-optimal portfolio
b(1)∗, b(2)∗ := arg min E log(b(1)X (1) + b(2)X (1))
b(1) ,b(2) ≥0,b(1) +b(2) =1
X (2)
(1, 0) log-optimal ⇐⇒ E ≤1 (2.2.2)
X (1)
(1)
X
(0, 1) log-optimal ⇐⇒ E (2) ≤ 1 (2.2.3)
X
(1) (2) X (1)
(b , b ) log-optimal ⇐⇒ E (1) (1)
b X + b(2)X (2)
X (2)
= E (1) (1) = 1, (2.2.4)
b X + b(2)X (2)
b(2) = r/(1 + r). Then we can rewrite the right hand sides of (2.2.2) and (2.2.3)
as
E exp Z ≤ 1, (2.2.2’)
E exp(−Z) ≤ 1. (2.2.3’)
By simple calculations, the right hand side of (2.2.4) is equivalent to the exis-
tence of
exp Z − 1 (2.2.4’)
r ∈ (0, ∞) such that E = 0.
1 + r exp Z
From (2.2.2’) to (2.2.4’) one can observe the following:
i.e. again on µ and σ 2 only. The following lemma summarizes basic properties
of Vσ (µ), similar to the results derived in Ohlson (1972):
Proof. 1. On the one hand the log-optimal portfolio b(1)∗(µ, σ 2), b(2)∗(µ, σ 2) is
unique (Theorem 1.3.1), on the other hand, a continuous solution, say b(1)(µ, σ 2),
b(2)(µ, σ 2 ) , to the maximization problem
can be found (Aliprantis and Border, 1999, Theorem 16.31 together with Lemma
16.6). Hence, both coincide and b(1)∗(µ, σ 2 ) is a continuous function of µ. From
the equation Vσ (µ) = E log(b(1)∗ + (1 − b(1)∗) exp Zµ,σ2 ) the continuity assertion
follows.
Now, let −σ 2/2 < µ < ν. Then
The first inequality follows from −σ 2/2 < µ, i.e. b(1)∗(µ, σ 2 ) < 1, the second
inequality holds by definition of b(1)∗ as a component of the log-optimal portfolio.
2. is a direct consequence of (2.2.5) and (2.2.6). We just check Vσ (µ) =
E log X (1) − µ1 = µ1 − µ1 = 0 for b(1)∗, b(2)∗ = (1, 0) and calculate Vσ (µ) =
3. As noted above µ = 0 implies b(1)∗, b(2)∗ = (1/2, 1/2). Hence we find that
1
Vσ (0) = log + E log(1 + exp Z0,σ2 )
2
Z∞
w2
1 1
= log + √ log(1 + exp(σw)) exp − dw.
2 2π 2
−∞
2.2 Dimensionality in portfolio selection 67
From this representation we see that Vσ (0) is continuous for σ ∈ (0, ∞). More-
over, using the monotone convergence theorem (Williams, 1991, 5.3) we calcu-
late
Z∞
w2
1 1 1
V0 (0) := lim+ Vσ (0) = log + √ log 2 exp − dw = log + log 2 = 0.
σ→0 2 2π 2 2
−∞
0.6
âU = 1.00000
âL = 0.50000
W = 0.08940
0.4 Ü1 = - 0.05000
m = 0.08666
Vâ (Ü)
Vâ (·)
U
0.2
Vâ (·)
L
-0.2
-0.6 âU2 -0.4 -0.2 â L2 Ü1 0 mâ L
2
0.2 0.4 âU 2 0.6
2 2 2 2
Ü
Combining this with the definition of the reduced portfolio value Vσ (µi − µj ) =
∗
V{i,j} −µj in (2.2.7) for a portfolio of stocks {i = 1, j = 2} (set σ 2 = σ12 −2σ12+σ22 )
∗
and of stocks {j = 2, i = 3} (set σ 2 = σ32 − 2σ23 + σ22), we obtain V{2,3} =
∗
VσL (µ3 ) + 0 < VσU (µ1 ) + 0 = V{1,2} .
Necessity: If instead of (2.2.1) we assume σ32 −2σ23 ≥ σ12 −2σ12 , then µ1 < µ2 :=
∗ ∗
0 < µ3 implies V{1,2} = VσU (µ1 ) + 0 ≤ VσU (0) ≤ VσL (0) < VσL (µ3 ) + 0 = V{2,3}
(Lemma 2.2.2, parts 1 and 2 for the first and third inequality, part 3 for the
second). 2
2.3 Examples
We conclude this chapter with examples of a real market where a situation as in
Theorem 2.2.1 is set up by empirical market data. We assume that the distribu-
tion of the returns is log-normal with the parameters provided by the standard
2.3 Examples 69
The third column reports estimates based on the empirical mean and variance
of the difference Y (i) − Y (2) of the log-returns of stock i and stock 2.
Suppose we want to enhance a portfolio of C by either AXP or UTX. Since
σ12 − 2σ12 + σ22 < σ32 − 2σ23 + σ22 we conclude that there is no indication that we
should prefer AXP to UTX.
Example 2.2: Next, consider the following stocks from the Dow Jones Trans-
portation Average:
15
relative frequency [%]
10
5
0
99
a b c
95
90
80
70
percentage
60
50
40
30
20
10
5
7.5
autocovariance [ 10 -4 ]
5.0
2.5
0.0
0 10 20 30 40 50
lag
2.2 c) Sample autocovariance function.
Figure 2.2: Diagnostic plots for American Express Co. (AXP) log-returns from
closing prices NYSE, 2/1/1998-30/11/2000.
∗ ∗
additional stock j weight b of YELL V{2,j} − V{2} residual value
1 (JBHT) 0.523951 1.9910 · 10−4 −1.6092 · 10−10
3 (UNP) 0.465490 1.5407 · 10−4 1.3154 · 10−10
∗ ∗
V{2,j} − V{2} is the improvement of the portfolio value achieved under inclusion
of stock j. As can be seen our suspicion was justified: Choosing JBHT yields
(slightly) greater portfolio improvement than UNP. The residual value in the
fourth column is 1 − EX (2)/(bX (2) + (1 − b)Xj ) and indicates that the Kuhn-
Tucker condition (2.2.4) for log-optimality of the stated portfolio weight b is
72 Chapter 2. Portfolio benchmarking: rates and dimensionality
satisfied. The values in the third and fourth column were computed with an
error of at most 10−9 using the composite trapezoidal rule.
73
CHAPTER 3
in investment performance.
We will embark on the second question, the forecasting problem, in Section 3.2.
Once we have seen that among the many ways of forecasting, so-called “strong”
forecasting is the method of choice for the greedy strategy, we will leave the stock
market behind and handle the problem in the general framework of Gaussian
time series forecasting. Clearly, the prediction of stationary Gaussian time series
is of interest very much in its own right, with applications arising in many fields.
Based on an approximation argument (Section 3.2.1), a forecasting algorithm
will be presented (Section 3.2.2) that – under weak regularity conditions –
is strongly consistent for huge classes of Gaussian processes (Theorem 3.2.2).
Explicit examples are given, highlighting how general the algorithm is (examples
after Corollary 3.2.3). The results are proved in Section 3.3. Simulations and
further examples in Section 3.4 conclude the chapter.
– He takes the investment decision on the basis of the predicted return of the
next market period only.
3.1 A strategy using predicted log-returns 75
1. At each step n he produces an estimate Ŷn+1 for the next outcome Yn+1 on
the basis of the observed Yn, ..., Y1 (note that it is not possible to observe
the process to the infinite past).
2. He invests according to
(
1 if exp(Ŷn+1 ) ≥ r,
b∗approx := (3.1.2)
0 otherwise.
If the portfolio is not rearranged on a daily basis, but say, on a two month basis,
this is a typical example of a buy-and-hold strategy: Once a fixed amount of
money is invested according to the investor’s belief what the market will look
like in two month’s time, the portfolio remains unchanged. A similar “greedy”
buy-and-hold strategy (where only stocks from the CBS index are picked whose
predicted two month return exceeds a certain threshold) has been investigated
in a case study described in Franke et al. (2001, Sec. 16.4 and the references
there).
Comparing b∗approx = arg maxb∈[0,1] log(b exp(Ŷn+1) + (1 − b)r) and (3.1.1), we
see that the greedy strategy is very much in the spirit of approximating the
log-optimality principle. However, b∗ will be a function of Yn, Yn−1 , ... rather
than of a single statistic Ŷn+1, and the log-optimal portfolio will be diversified
(i.e., b∗ ∈ [0, 1]), not just 0 or 1. As a consequence the investor loses out
on investment performance in comparison with the log-optimal portfolio. Two
questions arise:
– How should he construct the statistic Ŷn+1 in order not to lose out “too
much” in comparison with log-optimal portfolio performance?
– What performance loss does the particular Ŷn+1 inflict on the investor in
the worst possible case?
Not every g and not every Ŷn+1 will be appropriate for this kind of approxima-
tion. A Taylor expansion may give us some guideline: Put
fb (y) := log(b exp(y) + (1 − b)r)
and note that
1
0 ≤ fb0 (y) = ≤ 1,
1 + (1/b − 1)r exp(−y)
(1/b − 1)r exp(−y) 1
0 ≤ fb00 (y) = ≤ .
(1 + (1/b − 1)r exp(−y))2 4
Now consider the expansion
1
fb (Yn+1 ) = fb (Ŷn+1 ) + fb0 (Ŷn+1)(Yn+1 − Ŷn+1) + fb00 (ξb )(Yn+1 − Ŷn+1 )2 (3.1.3)
2
with some random ξb from the convex hull of Yn+1 and Ŷn+1. From (3.1.3) and
the σ(Yn, Yn−1, ...)-measurability of Ŷn+1 we obtain
E[fb (Yn+1 )|Yn, Yn−1, ...] = fb (Ŷn+1 ) + fb0 (Ŷn+1)(E[Yn+1|Yn , Yn−1, ...] − Ŷn+1 )
1
+ E[fb00(ξb )(Yn+1 − Ŷn+1 )2 |Yn, Yn−1 , ...].
2
As can be seen, the choice
Ŷn+1 := E[Yn+1|Yn, Yn−1 , ...]
not only makes the first order term vanish but also mimimizes the upper bound
1
E[fb00 (ξb)(Yn+1 − Ŷn+1)2 |Yn , Yn−1, ...] ≤ 1 E[(Yn+1 − Ŷn+1)2 |Yn , Yn−1, ...]
2 8
on the second order term.
Using b∗approx (based on Ŷn+1 ), the investor loses at most
E[fb∗ (Yn+1) − fb∗approx (Yn+1)|Yn , Yn−1, ...]
= E[fb∗ (Yn+1) − fb∗approx (Ŷn+1)|Yn , Yn−1, ...]
+E[fb∗approx (Ŷn+1) − fb∗approx (Yn+1)|Yn, Yn−1 , ...]
≤ E[fb∗ (Yn+1) − fb∗ (Ŷn+1)|Yn, Yn−1 , ...]
+E[fb∗approx (Ŷn+1) − fb∗approx (Yn+1)|Yn, Yn−1 , ...]
1
= E[(fb00∗ (ξb∗ ) − fb00∗approx (ξb∗approx )) · (Yn+1 − Ŷn+1)2 |Yn , Yn−1, ...]
2
1
≤ E[(Yn+1 − Ŷn+1)2 |Yn , Yn−1, ...]
8
1
= Var[Yn+1 |Yn, Yn−1 , ...].
8
3.2 Prediction of Gaussian log-returns 77
Hence
1
Efb∗ (Yn+1 ) − Efb∗approx (Yn+1 ) ≤ EVar[Yn+1 |Yn, Yn−1 , ...]
8
1 1
= (VarYn+1 − VarŶn+1 ) ≤ VarYn+1, (3.1.4)
8 8
and on the average the investor won’t lose more than 18 VarYn+1 . If he is prepared
to sacrifice this amount, then the greedy strategy (3.1.2) is possible. Still, this
does not obviate the necessity to estimate Ŷn+1 = E[Yn+1|Yn , Yn−1, ...], but as
we will see in the next section, for many practically relevant markets this can
be done with reasonable effort.
with probability 1 (in the L1 -sense) for any stationary and ergodic process
{Yi }i. By stationarity, the computation of weakly consistent estimators (Fn :=
78 Chapter 3. Predicted stock returns and portfolio selection
σ(Yn, Yn−1, ...) for the moment) can be reduced to the so-called static forecasting
problem (Györfi, Morvai and Yakowitz, 1998), to find Ê such that
lim EÊ(Y0 , ..., Y−n) − E[Y1|Y0 , Y−1, ...] = 0.
n→∞
and E[Yn+1 |Yn, Yn−1 , ...]) converges with probability one to the minimal possible
value for any (bounded) stationary and ergodic process (Algoet, 1994). Such
estimators were obtained by Algoet (1992) and Morvai et al. (1996). Based on
Györfi, Lugosi and Morvai (1999), universal predictors for bounded or Gaussian
stationary ergodic processes have been constructed by Györfi and Lugosi (2001).
Throughout, we make the following two assumptions:
ensuring that the equality (3.2.2) holds with probability 1 (Brockwell and
Davis, 1991, Prop. 3.1.1).
and Hida and Hitsuda (1993). Indeed, (3.2.1) and (3.2.3) are standard assump-
tions in time series analysis, and a considerable variety of sufficient conditions
for the assumptions to hold are known. We just note that if f (λ), λ ∈ [−π, π],
is the spectral density of the process {Yn }∞
n=−∞ , then
(Brockwell and Davis, 1991, eq. 4.4.3; Shiryayev, 1984, VI §6 eq. (16f.)), and
Zπ
log f (λ)dλ > −∞ (3.2.5)
−π
is sufficient (and in fact also necessary) for the process to be purely nondeter-
ministic (Shiryayev, 1984, VI §5 Theorem 4). The simplest setting in which
(3.2.5) holds is the case when f happens to be bounded away from 0. Then
the process is strongly mixing (Ibragimov and Linnik, 1971, Theorem 17.3.3).
However, for the purpose of our analysis, we do not require this strong property.
We divide the problem of estimating E[Yn+1|Yn , Yn−1, ...] into two steps:
satisfy
∞ r
X
2 const.
|φk | ≤ (3.2.6)
log n
k=dn +1
with probability 1.
The proof of this result and of the next theorems is deferred to section 3.3.
and
γd := (γ(d), ..., γ(1)) ,
82 Chapter 3. Predicted stock returns and portfolio selection
we can obtain an explicit formula for E[Yn+1|Yn, ..., Yn−d+1]. In fact, assumption
(3.2.3) implies
γ(k) → 0 (k → ∞) (3.2.7)
(Brockwell and Davis, 1991, Probl. 3.9), and from (3.2.1) and (3.2.7) it follows
that Γd is non-singular (Brockwell and Davis, 1991, Prop. 5.1.1). For Gaussian
processes one has (Brockwell and Davis, 1991, §5.4; Shiryayev, 1984, II §13
Theorem 2)
E[Yd+1 |Yd , ..., Y1] = γd Γ−1 T
d (Y1 , ..., Yd ) ,
and thus
E[Yn+1|Yn , ..., Yn−d+1] = md (Yn, ...Yn−d+1). (3.2.9)
From (3.2.8) and (3.2.9) it is plausible to construct a simple estimator Êd,n for
the conditional expectation E[Yn+1|Yn , ..., Yn−d+1] by the following steps:
2. Set Γ̃d,n := (γ̂n(i − j))i,j=1,...,d . Posing Γ̃d,n = n1 AAT with the matrix A ∈
IRd×2n formed by the first d rows of the matrix
0 0 — 0 Y1 Y2 . . . Yn
0 — 0 Y Y ... Y 0
1 2 n
∈ IRn×2n ,
| |
0 Y1 Y2 . . . Yn 0 — 0
is non-singular. Hence, with (3.2.10) and γ̂n,d := (γ̂n(d), ..., γ̂n(1)), define
Remark. Even if γ̂n,d Γ̂−1 d,n constitutes a strongly consistent estimate for the
coefficients γd Γ−1
d of the autoregression function md , it may happen that esti-
mation errors (γ̂n,d Γ̂d,n − γd Γ−1
−1
d ) which are per se “acceptable”, occur together
with large values of the Yn, ..., Yn−d+1 “plugged in”. The resulting prediction
error |E[Yn+1 |Yn, ..., Yn−d+1] − Êd,n| becomes considerably large. Suitable trun-
cation limits the size of the Yi ’s without obscuring the information they contain.
Now, denoting the maximal absolute row sum of a real matrix A = (aij )i,j by
P
kAk∞ := maxi j |aij |, we establish the following convergence result for the
proposed estimator:
Theorem 3.2.2. Assume (3.2.1), (3.2.3), and choose dn and Ln such that for
some r ≥ 4, some δ > 0 and sufficiently large n
dn ≤ nr/(2(r−2)),
(log n)2/r (log log n)2(1+δ)/r
Ln kΓ−1 2 2(r+1)/r
dn k∞ dn = O(1),
n1/2
Ln kΓ−1 2
dn k∞ dn
→ 0 (n → ∞),
n
∞
X dn 1 2
exp − 2 Ln < ∞. (3.2.11)
L
n=1 n
2σ
Then
lim E[Yn+1 |Yn, ..., Yn−dn+1 ] − Êdn,n = 0 (3.2.12)
n→∞
with probability 1.
From (3.2.11), for the choice of dn and Ln , one needs some bound on the possible
84 Chapter 3. Predicted stock returns and portfolio selection
growth of kΓ−1
dn k∞ . Based on the spectral density f and its (essential) minimum
mf ≥ 0 we distinguish the following cases:
Case 1: f is bounded away from 0, mf > 0.
Case 2: f has finitely many zeros λ1 , ..., λm ∈ (−π, π] of orders p1 , ..., pm, that
is, there exist constants p− +
j , pj > 0, K ≥ 1, δ > 0 such that
1 f (λ)
< + < K
K |λ − λj |pj
1 f (λ)
< − < K
K |λ − λj |pj
for all λ ∈ (λj − δ, λj ). In this case we define the order of the jth zero as
pj := max{p− + ∗
j , pj } and set p := max{pj |j = 1, ..., m}.
Corollary 3.2.3. In cases 1-3 and under the assumptions (3.2.1) and (3.2.3)
the strong consistency relation (3.2.12) holds if, for n sufficiently large,
Case 1: dn ≤ ns and Ln := (log n)t ( 61 > s > 0, t ≥ 1).
1
Case 2: dn ≤ ns and Ln := (log n)t ( 6+4p∗ > s > 0, t ≥ 1).
s
Case 3: dn ≤ 1q log n and Ln := (log n)t (q > 4, 0 < s < 1, t ≥ 1).
Before proving the results, we give some examples to illustrate the application
of Lemma 3.2.1 and Corollary 3.2.3, such that for a suitable choice of dn and
Ln the consistency relation
lim Êdn ,n − E[Yn+1|Yn , Yn−1, ...] −→ 0
(3.2.13)
n→∞
holds with probability 1 for all processes in large classes Gi of Gaussian pro-
cesses.
3.2 Prediction of Gaussian log-returns 85
Example 3.1: First, let the class G1 consist of all Gaussian processes satisfying
(3.2.1) and (3.2.3) with spectral density bounded away from zero. We choose
1
dn := bns c ( > s > 0), Ln := log n
4
and obtain (3.2.13) for any element of G1 .
Indeed, for every element of G1 , ψ(z) has no zeros for |z| = 1 by (3.2.4). Then
ψ(z) never vanishes in the closed unit disk, and 1/ψ(z) is analytic on a disk
k
around 0 with radius 1 + for some > 0. Thus φk 1 + 2 → 0 as k → ∞,
−k
hence |φk | ≤ c 1 + 2 with some constant c > 0. Set ρ := (1 + /2)−2 < 1,
then
∞ ∞
X X ρ dn
|φk |2 ≤ c2 ρk = c2 ρ ≤ (log n)−3
1−ρ
k=dn +1 k=dn +1
for n sufficiently large if only dn/ log log n → ∞. Lemma 3.2.1 applies and
Corollary 3.2.3 yields (3.2.13).
For G1 and the choice dn = O(log n) it should be noted that (3.2.13) is also
a consequence of An, Chen and Hannan (1982, Theorem 6). From there it
follows that the Yule-Walker estimates (φ̂1 , ..., φ̂dn ) of the first dn coefficients in
the AR(∞) representation satisfy the uniform convergence property
1/2 !
log log n
sup |φ̂j − φj | = O
1≤j≤dn n
with probability 1. Using the estimates (φ̂1 , ..., φ̂dn ) and the fact that the true
coefficients of the AR(∞) representation converge to zero at an exponential
rate, one obtains (3.2.13). However, the next example illustrates that Lemma
3.2.1 and Corollary 3.2.3 are applicable in more general situations as well.
Example 3.2: Consider the class G2 of all Gaussian processes satisfying (3.2.1)
and (3.2.3) such that for all elements of G2 the following two conditions hold:
a) The corresponding spectral density f has only finitely many zeros, each
of which is of finite order in the sense of the above case 2 and
Note that the orders of the zeros as well as may be different for each element
of G2 . This class comprises G1 as well as Gaussian processes with transfer
functions such as ψ(z) = (1 − z)1/3 := 1 + ∞ k 2·5·...·(3k−4)
P
k=1 ψk z , ψk = − 3·6·...·(3k) , for
σ2 2 λ 1/3
which φk = 1·4·...·(3k−2)
3·6·...·(3k) and f (λ) = 2π 4 sin 2 , the process being purely
nondeterministic by (3.2.5).
With the choice
∞
Φ2k < ∞, Olivier’s theorem (Knopp, 1956, 3.3 Theorem 1) allows us
P
From
k=1
to infer kΦ2k −→ 0 (k → ∞), hence
∞
X const.
φ2k ≤
(dn + 1)/2
k=dn +1
for n sufficiently large. Thus (3.2.6) is fulfilled and Lemma 3.2.1 applies. On
the other hand, limn→∞ (log n)log log n n−s = 0 for any s > 0, and we can use
Corollary 3.2.3 to obtain (3.2.13).
(Hida and Hitsuda, 1993, III §3 eq. 3.31). We observe that σ(n+1) is in-
dependent of σ(Yn, Yn−1 , ...), where the latter σ-algebra makes ∞
P
k=1 φk Yn+1−k
measurable. Thus,
∞
X
0 = E[n+1|Yn , Yn−1, ...] = E[Yn+1|Yn , Yn−1, ...] + φk Yn+1−k
k=1
and
∞
X
E[Yn+1|Yn , Yn−1, ...] = − φk Yn+1−k . (3.3.1)
k=1
Moreover,
E[Yn+1|Yn , ..., Yn−d+1] = E E[Yn+1|Yn, Yn−1 , ...]Yn , ..., Yn−d+1
d
X ∞
X
=− φk Yn+1−k − E φk Yn+1−k Yn, ..., Yn−d+1 .
(3.3.2)
k=1 k=d+1
Now, set
∞
X
Hd (λ) := φk exp(−ikλ).
k=d+1
88 Chapter 3. Predicted stock returns and portfolio selection
Then |Hd (λ)|2 f (λ) is the spectral density of the linear filter ∞
P
k=d+1 φk Yn+1−k
(Brockwell and Davis, 1991, Theorem 4.10.1), and we obtain
∞ 2 Zπ
X
φk Yn+1−k = |Hd (λ)|2 f (λ)dλ
E
k=d+1 −π
Zπ ∞
X
≤ sup f (λ) |Hd (λ)|2 dλ = 2π sup f (λ) |φk |2 .
λ∈[−π,π] λ∈[−π,π] k=d+1
−π
Since the difference E[Yn+1|Yn, Yn−1 , ...]−E[Yn+1|Yn , ..., Yn−d+1] is the probability
limit of the Gaussian variables E[Yn+1 |Yn, ..., Yn−k+1] − E[Yn+1 |Yn, ..., Yn−d+1] as
k → ∞, it is itself Gaussian (Shiryayev, 1984, II §13.5).
We will apply the following lemma on the convergence of Gaussian random
variables that can be found in Buldygin and Dočenko (1977, Lemma 3).
Lemma 3.3.1. Let {Wn }∞ n=0 be a sequence of centered Gaussian random vari-
ables Wn ∼ N (0, σn2 ) with σn2 → 0 (n → ∞). If for every > 0
∞
X
exp − 2 < ∞ (3.3.3)
n=1
σn
For the proof of Theorem 3.2.2 we need some preliminary observations. First
we recall some simple facts from the theory of matrix norms. Let k · k be some
vector norm on IRd . By k · k also denote the corresponding matrix norm
kAyk
kAk := sup = sup kAyk
y6=0 kyk kyk=1
for A ∈ IRd×d . The spectrum of A is the collection of the moduli of the eigen-
values of A,
spr(A) := |λ|λ eigenvalue of A ,
Recall the following inequalities (Isaacson and Keller, 1994, Corollaries in Sec.
1.1 and 1.3):
Proof. 1. Existence of (I − A)−1 follows from the well known Neumann series.
(I − A)−1 = I + A(I − A)−1 implies k(I − A)−1 k ≤ 1 + kAk k(I − A)−1 k, or
using kAk < 1
1
k(I − A)−1 k ≤ .
1 − kAk
2. Inversion of B −1 C = I + B −1 C − I yields C −1B = (I + B −1 C − I)−1 .
Multiplying by B −1 from the right, we obtain C −1 = (I − (I − B −1 C))−1 B −1
3.3 Proof of the approximation and estimation results 91
We will use the vector norms kyk2 := ( di=1 yi2 )1/2 or kyk∞ := maxi=1,...,d |xi|,
P
and the corresponding matrix norms kAk2 := ρ(AT A)1/2 and kAk∞ := maxi=1,...,d
Pd
j=1 |aij |, respectively. For the latter norm we have
Proof. Let y = (y1 , ..., yd) ∈ IRd . Then kyk∞ ≤ kyk2 ≤ (d maxi yi2 )1/2 =
d1/2 kyk∞ and
kAyk∞ kAyk2
kAk∞ = sup ≤ d1/2 sup = d1/2kAk2 = (dρ(AT A))1/2
y6=0 kyk ∞ y6=0 kyk2
1
= (dρ(A2 ))1/2 = (dρ(A)2 )1/2 = d1/2 ρ(A) = d1/2 . 2
min spr(A−1 )
Lemma 3.3.4. Let {Yn}∞ n=1 be a stationary ergodic process with zero mean
and finite variance, allowing for a representation
∞
X ∞
X
Yn = ψj n−j with ψ0 = 1, |ψj | < ∞
j=0 j=0
92 Chapter 3. Predicted stock returns and portfolio selection
E[n|m , m < n] = 0,
E[2n|m , m < n] = σ2,
E(|n|r ) < ∞ for some r ≥ 4.
In the “Gaussian case” considered here, the conditions on the innovations are
fulfilled, since the n ’s are independent, identically N (0, σ2)-distributed, hence
E[n |m , m < n] = En = 0 and E[2n|m , m < n] = E(2n ) = σ2.
After these introductory remarks, we turn to the core of the
n
Proof of Theorem 3.2.2. Ȳn−d+1 := is a shorthand notation for the d-past
of the process (Yn−d+1 , ..., Yn)T , and we set TLȲn−d+1n
:= (TLYn−d+1 , ..., TLYn)T .
The prediction error can be decomposed into
Êd,n − E[Yn+1|Yn , ..., Yn−d+1] = γ̂d,n Γ̂−1 TL Ȳn−d+1
n −1 n
d,n n
− γd Γd Ȳn−d+1
−1 −1 n −1 n n
≤ (γ̂d,n Γ̂d,n − γd Γd )TLn Ȳn−d+1 + γd Γd (TLn Ȳn−d+1 − Ȳn−d+1 )
≤ dLnkγd Γ−1 −1 −1 n n
d − γ̂d,n Γ̂d,n k∞ + dkγd k∞ kΓd k∞ kTLn Ȳn−d+1 − Ȳn−d+1 k∞ . (3.3.4)
kI − Γ−1 −1 −1
d Γ̂d,n k∞ = kΓd (Γd − Γ̂d,n )k∞ ≤ kΓd k∞ kΓd − Γ̂d,n k∞
d
−1
X δik
= kΓd k∞ max γ(i − k) − γ̂d,n (i − k) − n
i=1,...,d
k=1
d
X 1
≤ kΓ−1
d k∞ max |γ(i − k) − γ̂d,n (i − k)| +
i=1,...,d n
k=1
−1 1
≤ kΓd k∞ d max |γ(i) − γ̂d,n(i)| + . (3.3.5)
i=0,...,d−1 n
3.3 Proof of the approximation and estimation results 93
From the An, Chen and Hannan (1982) result (Lemma 3.3.4), this tends to 0
with probability 1, if only (recall d = dn ≤ nr/(2(r−2)))
kΓ−1
dn k∞ dn
= O(1) (3.3.6)
rn
and
kΓ−1
dn k∞
→ 0 (n → ∞). (3.3.7)
n
Thus, for all ω ∈ Ω from a set of probability 1, for n sufficiently large
kI − Γ−1
d Γ̂d,n k∞ (ω) < 1
For any second order stationary process, there exists a constant cγ with
0 ≤ |γ(k)| ≤ cγ < ∞ (k ∈ IN0 ),
hence kγd k∞ ≤ cγ (this can also be seen from (3.2.7)). We set Md,n :=
maxi=0,...,d |γ(i) − γ̂n (i)|. Now, appealing to (3.3.5) again and tidying things
up, we obtain
kΓ−1 −1 −1 2
d k∞ (cγ kΓd k∞ d + 1)Md,n + kΓd k∞ cγ /n
kγd Γ−1 −1
d − γ̂d,n Γ̂d,n k∞ ≤ −1 −1
1 − (kΓd k∞dMd,n + kΓd k∞/n)
(for all ω ∈ Ω from a set of probability 1 and for n ≥ N (ω)). Finally, according
to Lemma 3.3.4, dLnkγdΓ−1 −1
d − γ̂d,n Γ̂d,n k∞ → 0 with probability 1 if
Ln kΓ−1 2 2
dn k∞ dn
= O(1), (3.3.8)
rn
Ln kΓ−1 2
dn k∞ dn
→ 0. (3.3.9)
n
94 Chapter 3. Predicted stock returns and portfolio selection
dkΓ−1 n n
d k∞ kTLn Ȳn−d+1 − Ȳn−d+1 k∞ → 0
P dkΓ−1 n n
d k∞ kTLn Ȳn−d+1 − Ȳn−d+1 k∞ ≥
= P dkΓ−1 d k ∞ max |T Y
Ln n−i − Y n−i | ≥
i=0,...,d−1
≤ P (∃i ∈ {0, ..., d − 1} : |Yn−i| ≥ Ln ) = P max |Yn−i | ≥ Ln
i=0,...,d−1
Hence (3.3.6) and (3.3.7) are implied by (3.3.8) and (3.3.9), respectively. Rewrit-
ing (3.3.8) as
(log n)2/r (log log n)2(1+δ)/r
Ln kΓ−1 2 2(r+1)/r
dn k∞ dn = O(1),
n1/2
we end up with the following four conditions
dn ≤ nr/(2(r−2)),
Ln kΓ−1 2 2(r+1)/r
dn k∞ dn
(log n)2/r (log log n)2(1+δ)/r
· = O(1),
n1/2
Ln kΓ−1 2
dn k∞ dn
→ 0 (n → ∞),
n
∞
X dn
1
exp − 2 L2n < ∞
L
n=1 n
2σ
in order to obtain
lim E[Yn+1 |Yn, ..., Yn−dn+1 ] − Êdn,n = 0
n→∞
with probability 1. 2
Finally, we prove Corollary 3.2.3.
Proof of Corollary 3.2.3. Recently, Serra Capizzano obtained the following
result (1999, Theorem 3.2; 2000, Theorem 1.2) on the “worst” rate of decay for
the minimal eigenvalue µmin
d of Toeplitz matrices
Td (f ) := (γ(i − j))i,j=1,...,d
formed by the coefficients of the Fourier expansion
∞
1 X
f (λ) = γ(j) exp(−ijλ) (λ ∈ [−π, π])
2π j=−∞
The constants c and K are related to the measure of the set where f essentially
vanishes, not disclosed to the statistician. However, choosing some 0 < s < 1,
one has
K exp(−cd) + mf ≥ exp(−d1/s ) (3.3.12)
for sufficiently large d. As already noted, from (3.2.3) and (3.2.4), the spectral
density f of the process under consideration is continuous. Thus the require-
ments of Lemma 3.3.5 are met for Td (f ) = Γd , and Lemma 3.3.3 together with
(3.3.11) and (3.3.12) yields
d1/2
kΓ−1
d k∞ ≤ ≤ d1/2 exp(d1/s).
µmin
d
kΓ−1 2
dn k∞ dn exp(2(dn)1/s )
≤ = dn n2/q−1/2
n1/2 n1/2
with 2/q − 1/2 < 0. As to the fourth inequality, observe that dn/Ln → 0 as
n → ∞ and for n sufficiently large −L2n /(2σ 2) ≤ −2 log n, hence
1 2 1
exp − 2 Ln ≤ 2 .
2σ n
The lemma for the more restrictive cases 1 and 2 follows similarly, using
µmin
d ≥ 2π inf f (λ) > 0
λ∈[−π,π]
with |Φ| < 1 (Brockwell and Davis, 1991, Example 4.4.2). The processes have
the spectral densities
σ2
f (λ) = (1 − 2Φ cos λ + Φ2 )−1 ,
2π
bounded away from zero. The autocovariance function is given by
k
Φ2 + 1 2
2Φ
γ(0) = 2 σ and γ(k) = γ(0),
(Φ − 1)2 1 + Φ2
from which we can calculate the conditional expectation E[Yn+1|Yn , ..., Y1] of the
next output given the past using (3.2.9). The latter conditional expectation acts
as an approximation of the unknown true autoregression E[Yn+1|Yn , Yn−1, ...].
Figure 3.1 a-b) shows two paths of the process (circles) for different values
of σ 2 and Φ together with the corresponding autoregression (grey) and the
estimated autoregression (black). The convergence of the estimates towards
the true autoregression is clearly visible.
Example 3.2 (continued): The process from Example 3.2 in Section 3.2 has
a spectral density
1/3
σ2
λ
f (λ) = 4 sin2 .
2π 2
From this we calculate the autocovariances
Zπ
γ(k) = f (λ) cos(kλ)dλ
0
(Brockwell and Davis, 1991, eq. 4.3.10) using a compound trapezoidal integra-
tion rule with an error of at most 10−7 . Figure 3.2 a) shows the autocovariance
function of the process for a variance σ2 = 0.01 of the innovations. Again,
E[Yn+1|Yn , Yn−1, ...] is approximated by E[Yn+1 |Yn, ..., Y1], which is calculated by
(3.2.9). The Gaussian process {Yn}n itself is simulated by the method described
98 Chapter 3. Predicted stock returns and portfolio selection
3
−1
−2
−3
0 5 10 15 20 25 30 35 40 45 50
1.5
0.5
−0.5
−1
−1.5
−2
−2.5
−3
0 5 10 15 20 25 30 35 40 45 50
Figure 3.1: “True” (grey) and predicted (black) autoregression for two Gaussian
AR(1) models.
3.4 Simulations and examples 99
0.013
0.000
autocovariance
-0.001
-0.002
0 10 20 30 40 50
lag
3.2 a) The autocovariance function of the process.
0.4
0.3
0.2
0.1
−0.1
−0.2
0 5 10 15 20 25 30 35 40 45 50
Figure 3.2: The autocovariance function and a sample path of the process in
Example 3.2.
100 Chapter 3. Predicted stock returns and portfolio selection
in Brockwell and Davis (1991, Ex. 8.16, p. 271). Figure 3.2 b) shows the reali-
sations of Yn (circles), the “true” (grey) and the predicted (black) autoregression
for 50 consecutive days. Again, the convergence result is convincing.
The following two examples illustrate the performance of the greedy strategy
from Section 3.1.
Example 3.3: We run the greedy strategy from Section 3.1 in a market with
a stock whose price follows a geometrical Brownian motion (Luenberger, 1998,
Sec. 11.7; Korn and Korn, 1999, Ch. 2) with a mean return of 2% p.a. and
a volatility of σ = 10% p.a.. The bond offers a riskless return of 2% p.a. (i.e.
we set r = 2%/365). The algorithm of Section 3.2 is used to predict the log-
returns of the stocks. Figure 3.3 shows the value of an investment of $1 either
solely in the stock (grey, solid) or in the bond (grey, dashed). The value of the
greedy strategy is shown by the solid black line. In times when the share price
is likely to plummet the investor takes refuge in the bond. Thus he participates
1.06
1.04
1.02
0.98
1.5
1.4
1.3
1.2
1.1
0.9
0.8
0.7
0.6
0.5
0 20 40 60 80 100 120 140 160 180 200
1.2
0.8
1.2
1.1
0.9
0.8
0.7
0.6
0 20 40 60 80 100 120 140 160 180 200
Figure 3.4: Daily value of a $1 investment in some shares from Dow Jones
Indices at NYSE 24/4/1998-8/2/1999 (grey, solid), in a bond with short rate
2% p.a. (grey, dashed) or in a greedy strategy (black), respectively.
in the rise of the share price more than in its fall, increasing his annual yield
beyond 2%. As in Section 2.2 this is the phenomenon of volatility pumping
(Luenberger, 1998, Examples 15.2 and 15.3). The share’s volatility is used to
draw an above average return from the stock.
Example 3.4: We replace the geometrical Brownian motion by various real
stock price processes on 200 days of trading, using the NYSE closing prices
24/4/1998-8/2/1999 (data from www.wallstreetcity.com). Figure 3.4 a-c) shows
the corresponding charts. Although the greedy strategy does not manage to
yield a return at least as large as the bond’s return in all cases (Fig. 3.4 b),
it typically outperformed the stock, considerably reducing the investor’s risk of
financial loss from pure share investment.
103
CHAPTER 4
models, and consequently, most of the work on transaction costs concerns the
continuous-time case. Not much emphasis was put on statistical discrete-time
strategy building. A detailed overview of literature can be found in Davis and
Norman (1990), Fleming (1999), Cadenillas (2000) or Bielecki and Pliska (2000).
However, many models and strategies involved considerable computational ef-
fort, which made it necessary to use approximation techniques (see e.g. Fitz-
patrick and Fleming, 1991, and Atkinson et al., 1997). Nowadays, continuous-
time models are frequently approximated by time-discrete models (for example
in Bielecki, Hernandez and Pliska, 1999, to discretize the continuous-time model
in Bielecki and Pliska, 1999). This brought about a renaissance of discrete-time
modelling. Up to date surveys of strategy planning under transaction costs
in discrete-time markets can be found in Carassus and Jouini (2000) for a
Cox-Rubinstein type model, in Blum and Kalai (1999) for optimal constant-
rebalanced portfolios and in Bobryk and Stettner (1999) for a market with a
bond and one stock having i.i.d. returns.
V1: {Xi }∞ m
i=−d+1 is a stationary [A, B] -valued stochastic process on a proba-
bility space (Ω, A, P) (0 < A < B < ∞ need not be known),
V3: E[h(b, X̄i+1 )|X̄i = x̄] is a continuous function of (b, x̄) ∈ S × [A, B]dm ,
4.1 Strategies in markets with transaction fees 105
for all continuous functions h : S × [A, B]dm −→ IR and all i. Thus, at time
i − 1, the further evolution of the market depends upon a sub-σ-algebra σ(X̄i)
of the total information field Fi := σ(X−d+1 , ..., Xi−1).
Following Blum and Kalai (1999) we assume that only purchasing shares in the
market generates transaction costs (brokerage fees, commission) proportional
to the total value of the transaction, i.e.
with a commission factor c ∈ [0, 1). Paying into and drawing money from
the risk-free account does not generate any fees. In case two commission fac-
tors apply for selling and purchasing shares, say csell and cpurchase, one may use
c := (csell + cpurchase)/(1 + cpurchase) as a compound commission factor applying to
purchases only. This approach never underestimates the capital reducing effect
of transaction costs. Indeed, with 1 unit of money in a stock one can purchase
(1 − csell )/(1 + cpurchase) = 1 − c value in another stock or pay 1 − csell ≥ 1 − c
units of money into the account. Conversely, 1 unit of money in the bond can
purchase 1/(1 + cpurchase) ≥ 1 − c value in a stock.
To see how investment actions are limited by transaction fees, consider a fixed
time instant i. Then the investor’s wealth Wi is used to acquire a new portfolio,
given by an enhanced portfolio vector bi+1 := (bi+1,−1 , ..., bi+1,m)T , which now is
(m + 2)-dimensional. Here
bi+1,−1 is the proportion of Wi needed to settle the transaction costs that arise
when the portfolio is restructured,
Now, in the market period from time i − 1 to time i the investor’s wealth Wi−1
generated a value of (1 + r)bi,0 Wi−1 in the bond and of Xi,j bi,j Wi−1 in the jth
stock. An amount of bi,−1 Wi−1 was used to settle transaction fees and is no
106 Chapter 4. A Markov model with transaction costs: probabilistic view
m
Wi X
= (1 + r)bi,0 + Xi,j bi,j . (4.1.1)
Wi−1 j=1
If x̄ is the matrix formed from the last d observed return vectors and s denotes
the last portfolio vector, we call the collection of all portfolios satisfying the
self-financing condition,
the admissible set corresponding to (s, x̄) ∈ S × [A, B]dm . Note that for all
(s, x̄) ∈ S × [A, B]dm
a∗ := (0, 1, 0, ..., 0)T ∈ S(s, x̄), (4.1.4a)
i.e. there is always one option open to the investor: He can pay all his wealth
into the risk-free account at any time.
The investor can only follow non-anticipating portfolio strategies which comply
with the self-financing condition:
2. bi is Fi -measurable and
3. b0 = a∗ .
We now move on to defining the investment goal. From the previous chapters
we know that the logarithmic utility function f : S × ([A, B]m )d −→ IR,
is the optimal choice for long-run and short-term investment targets alike (note
that the entry 0 in the first vector of the scalar product corresponds to the
amount of transaction costs which is lost). In this chapter we therefore assume
that the investor aims to choose an admissible strategy {bi}i such that, in the
long run, the expected mean utility E( n1 n−1
P
i=0 f (bi , X̄i+1 )) is larger than for any
other strategy based on some portfolio selection function. This is formalized by
inequality (4.2.4) below.
It should be pointed out that the process {Xi }i need not contain mere return
information, but may contain additional factors and side information in
some of its coordinates as well. – Provided of course that these occur in the
108 Chapter 4. A Markov model with transaction costs: probabilistic view
Theorem 4.2.1. Let hi ∈ C(S × [A, B]dm ) be a solution of the Bellman equa-
tion
hi (s, x̄) = max m(b, x̄) + (1 − δi )E[hi(b, X̄d+1 )|X̄d = x̄] . (4.2.2)
b∈S(s,x̄)
With Vi (b, x̄) := m(b, x̄) + (1 − δi )E[hi(b, X̄d+1 )|X̄d = x̄] we obtain an admissible
portfolio strategy by
b∗0 := a∗
b∗i := arg max Vi (b, X̄i). (4.2.3)
b∈S(b∗i−1 ,X̄i )
This strategy is optimal in the sense that for any portfolio strategy {bi}i based
on a portfolio selection function
n−1 n−1
!
1X 1 X
lim inf Ef (b∗i , X̄i+1 ) − Ef (bi , X̄i+1 ) ≥ 0. (4.2.4)
n→∞ n i=0 n i=0
1. (4.2.4) compares the mean utility of the investment schemes {b∗i }i and
{bi }i in the worst case (lim inf) that may occur for remote time horizon.
Hence, the result of Theorem 4.2.1 is a worst case analysis. The Bellman
equation (4.2.2) penalizes the “target” function m to obtain a target that
is adjusted to loss through transaction costs. It thus balances the need to
make as many transactions as needed, but, at the same time, as few as
possible.
arg maxb∈S(b∗i−1 ,X̄i ) m(b, X̄i) coincides with the classical log-optimal strat-
egy.
δ1 = δ2 := d1
δ3 = δ4 = δ5 := d2
δ6 = δ7 = δ8 = δ9 := d3
δ10 = ....
The Bellman equation is closely linked with the theory of Markov control pro-
cesses and stochastic dynamic programming (SDP). These have been applied to
financial mathematics since the 1960s, e.g. by Samuelson (1969), Merton (1969)
and Bertsekas (1976, Sec. 3.3). A good introduction to SDP and Markov control
can be found in Bertsekas (1976), Bertsekas and Shreve (1978), in Hernández-
Lerma and Lasserre (1996 and 1999) or in Bather (2000). Among recent appli-
cations to discrete-time finance we only mention Hernández-Lerma and Lasserre
(1996, Example 1.3.2) and Duffie (1988, Sec. III.19) – both contain more refer-
ences. It should be pointed out, however, that none of the classical models can
properly deal with the transaction cost problem we are dealing with. Before
giving details of the proof of Theorem 4.2.1, we should briefly comment on that.
(−c thus becomes the utility function). Optimal strategies can be generated
from solutions (ρ∗ , h) of the Bellman equation
Z
∗
ρ + h(x) = min c(x, a) + h(y)Q(dy|x, a)
a∈A(x) X
4.2 An optimal strategy 111
(0 < δ < 1) are needed (Hernández-Lerma and Lasserre, 1996, Theorems 5.4.3
and 5.5.4) or recurrence and irreducibility conditions for the Markov chain with
the transition probability distribution Q (Ross, 1970, Cor. 6.20 or Bertsekas,
1976, Prop. 3). Typically, most research assumes the state space X to be count-
able and the additional existence of a state x∗ with Q(x∗|x, a) ≥ const. > 0 for
all x ∈ X, a ∈ A(x). Such conditions are hard to verify from market data and
–what is even worse– they are not satisfied for the control problem of portfolio
selection under transaction costs. Indeed, we have seen that the collection of
admissible actions (portfolio choices bi) at time i − 1 depends on the last d
observed return vectors as well as on the last chosen control bi−1 . Therefore,
we have no other choice than to describe the state of the system by the joint
vector (bi−1 , X̄i ). Then the transition dynamics under control bi is given by
(bi−1 , X̄i ) 7−→ (bi , X̄i+1 ). We thus end up with a transition probability distribu-
tion Q that clearly does not satisfy the mentioned recurrence and irreducibility
conditions. This drawback was also noted by Bielecki, Hernández and Pliska
(2000). They observe that the typical conditions imposed to ensure the ex-
istence of a solution of the Bellman equation are to rigid to be applicable in
transaction cost problems. On the other hand they found it still possible to char-
acterize optimal strategies in terms of optimality equations which correspond
to the classical Bellman equation. This is supported by the earlier findings of
Stettner (1999) and our results.
strategies (Lemma 4.2.6). It is only in the fourth step that, using the technical
tools derived before, we will completely prove Theorem 4.2.1.
1st step in the proof of Theorem 4.2.1: Solving the Bellman equation.
In dynamic programming, the Bellman equation usually takes the form
λ + h(s, x̄) = max m(b, x̄) + (1 − δ)E[h(b, X̄d+1)|X̄d = x̄] , (4.2.5)
b∈S(s,x̄)
which is to be solved for h ∈ C(S × [A, B]dm ) and λ ∈ IR (see e.g. Bertsekas,
1976, Sec. 8.1 or Hernández-Lerma and Lasserre, 1996, Sec. 5.2). We first de-
rive some basic facts about the existence of solutions and their properties. The
following proposition is well known from the theory of dynamic programming,
we nonetheless outline a proof.
Proposition 4.2.2. For all δ ∈ (0, 1) there exists a solution (h, λ) ∈ C(S ×
[A, B]dm ) × IR and a solution (h, 0) ∈ C(S × [A, B]dm ) × {0} of the Bellman
equation (4.2.5).
Proof. In the following let g ∈ C(S × [A, B]dm ), λ ∈ IR. The seminorm
a Banach space with norm k[g]k := kgk, where [g] := {g(·, ·) + r|r ∈ IR} denotes
the equivalence class of g ∈ C(S × [A, B]dm ).
Indeed, K := {constant functions} is a closed subspace of the space (C(S ×
[A, B]dm , k · k∞). Then (C(S × [A, B]dm )K, k · k∗ ) with
is a Banach space (Hirzebruch and Scharlau, 1996, Lemma 5.10). The norms
k·k∗ and k·k are equivalent on C(S ×[A, B]dm )K because of inf c∈IR kf +ck∞ =
1
2 kf k.
Note that, V3 implies that for any g ∈ C(S × [A, B]dm ), the conditional expec-
tation E[g(b, X̄d+1 )|X̄d = x̄] is continuous in (b, x̄) ∈ S × [A, B]dm , and S(·, ·) is
4.2 An optimal strategy 113
continuous in the sense of Aliprantis and Border (1999, Definition 16.2 and The-
orems 16.20, 16.21). Now, by Berge’s Maximum Theorem (Aliprantis and Bor-
der, 1999, Theorem 16.31), maxb∈S(s,x̄){m(b, x̄) + (1 − δ)E[g(b, X̄d+1 )|X̄d = x̄]}
is continuous on S × [A, B]dm . Hence we can define the operator
On C ∗ , M operates according to M [g] := [M g], observe that the right hand side
is independent of the chosen representative of [g]. Solving (4.2.5) thus becomes
equivalent to solving the functional equation
(M h)(s, x̄) − (M g)(s, x̄) ≤ (1 − δ)E[h(b∗ , X̄d+1 ) − g(b∗ , X̄d+1 )|X̄d = x̄]
≤ (1 − δ) max (h(s, x̄) − g(s, x̄))
(s,x̄)
max ((M h)(s, x̄) − (M g)(s, x̄)) ≤ (1 − δ) max (h(s, x̄) − g(s, x̄)) .
(s,x̄) (s,x̄)
for 0 < 1 − δ < 1, a contraction property that implies the existence of a solution
of (4.2.5).
Finally, let (h, λ) ∈ C(S × [A, B]dm ) × IR be an arbitrary solution of (4.2.5),
λ + h(s, x̄) = max m(b, x̄) + (1 − δ)E[h(b, X̄d+1 )|X̄d = x̄] .
b∈S(s,x̄)
1. δi khi k∞ ≤ kf k∞ .
2. khi+1 − hi k∞ ≤ kf k∞ .
The right hand side of this chain of inequalities remains the same when swapping
i and j, therefore
δi − δi+1
khi − hi+1 k∞ ≤ 2 kf k∞ ≤ kf k∞ .
δi+1
116 Chapter 4. A Markov model with transaction costs: probabilistic view
and
Ehi (bi−1 , X̄i )−Ehi(bi , X̄i+1 ) ≥ Em(bi, X̄i )−δi Ehi (bi, X̄i+1 ) ≥ −kf k∞ −δi khi k∞ .
We need only show that this is a RACS for which a suitable selector exists:
4.2 An optimal strategy 117
is a RACS for which a selector of the form φi (bi−1 , X̄i ) exists, with a measurable
function
φi : S × [A, B]dm −→ S.
From Lemma 4.2.4 it follows that the recurrence relation (4.2.3) is solved by
In particular, b∗i is a selector for arg maxb∈S(bi−1 ,X̄i ) Vi (b, X̄i), hence Fi -measurable,
so that {b∗i }i constitutes an admissible strategy.
Proof of Lemma 4.2.4. The proof will be given in three steps. Fix some
i ∈ IN.
First note that S(·, ·) : S × [A, B]dm → C : (s, x̄) 7→ S(s, x̄) is a measurable
mapping, so that S(bi , X̄i+1 ) is a RACS: The continuity of gc implies that {b ∈
S|gc (s, x̄, b) = 0} is a closed subset of S. For the measurability of S(·, ·) we
need only show that for any compact K ⊆ S
with the σ-algebra B(S × [A, B]dm ) of Borelian sets on S × [A, B]dm . To this
end, choose a countable dense subset K 0 of K. Using the continuity of gc again,
it is easily verified that
\ [ 1
S −1 (FK ) =
(s, x̄) |gc (s, x̄, k)| < ,
0
n
n∈IN k∈K
Now, the proof is finished observing that {b|(s, x̄, b) ∈ D} = S(s, x̄), {(s, x̄) | ∃b :
(s, x̄, b) ∈ D} = S × [A, B]dm (because of a∗ ∈ S(s, x̄)). Thus
with probability 1. 2
N ({bi}i , ) measures how long it takes the strategy {bi}i to approach the long-
term optimum for the first time up to an error of . Note that there is some
N ∈ IN such that
1 XN n
1X
Ef (bi, X̄i+1 ) − lim sup Ef (bi , X̄i+1 ) ≤ ,
N n
n→∞
i=1 i=1
so that N ({bi}i , ) < ∞. Using N ({bi}i , ) one can aproximate any strategy
based on a portfolio selection function arbitrarily closely by a periodic strategy
in the following sense:
n−1
1 P
Proof. Let N := N (({bi}i, ), µn := n Ef (bi , X̄i+1 ) and s := lim supn→∞ µn ,
i=0
i.e.
|µN − s| ≤ . (4.2.9)
Considering the fact that, at any stage, the investor may choose the portfolio
a∗ , we define an N -periodic admissible strategy b̃i by
In particular, b̃0 = b0, ..., b̃N −1 = bN −1 . Hence, for all k ∈ IN0 (convention
1
P−1
0 i=0 ... = 0):
n−1 n−1
!
1X 1X
lim inf Ef (b̃i, X̄i+1 ) − Ef (bi , X̄i+1 ) ≥ lim inf µ̃n − lim sup µn
n→∞ n i=0 n i=0 n→∞ n→∞
= lim inf µ̃n − s ≥ lim inf − |µ̃n − s| = − lim sup |µ̃n − s| ≥ −,
n→∞ n→∞ n→∞
This yields
n−1 n−1
!
1X 1X
lim inf Ef (b∗i , X̄i+1 ) − Ef (bi , X̄i+1 )
n→∞ n i=0 n i=0
n−1 n−1
!
1X 1X
≥ lim inf Ef (b∗i , X̄i+1 ) − Ef (b̃i, X̄i+1 )
n→∞ n i=0 n i=0
n−1 n−1
!
1X 1X
+ lim inf Ef (b̃i, X̄i+1 ) − Ef (bi , X̄i+1 )
n→∞ n i=0 n i=0
≥ −,
First note that the Bellman equation implies that for all admissible strategies
{bi }i
Taking expectations,
(1 − δi+1 )Ehi+1 (bi+1 , X̄i+2 ) = (1 − δi+1 )E E[hi+1 (bi+1 , X̄i+2 )|Fi+1 ]
≤ Ehi+1 (bi , X̄i+1 ) − Em(bi+1, X̄i+1 ),
We now investigate the asymptotic behaviour of the terms on the right hand
side.
The first and the second term A and B of (4.2.12) are of the same form and
tend to 0 as n → ∞ since
n−2
1 X
lim E hi+1 (bi+1 , X̄i+2 ) − hi+1 (bi , X̄i+1 ) = 0
n→∞ n
i=−1
for any admissible strategy {bi }i . To prove this, we set h−1 := 0 and consider
the decomposition
n−2
1 X
hi+1 (bi+1 , X̄i+2 ) − hi+1 (bi , X̄i+1 ) = D + E
n i=−1
4.2 An optimal strategy 123
into
n−2
1 X
D := hi+1 (bi+1 , X̄i+2 ) − hi (bi , X̄i+1 )
n i=−1
and
n−2
1 X
E := hi (bi , X̄i+1 ) − hi+1 (bi , X̄i+1 ) .
n i=−1
D is a telescopic sum, and using part 1 of Lemma 4.2.3 and the assumptions
about {δi }i , we find that
hn−1 (bn−1, X̄n ) − h−1 (b−1 , X̄0 ) khn−1 k∞ kf k∞
|D| = ≤ ≤ → 0.
n n δn−1 n
As to E we note that δ1 = δ2 , δ3 = δ4 = δ5 , δ6 = ... = δ9 , etc. implies that
√
h1 = h2 , h3 = h4 = h5 , h6 = ... = h9 , ... . Hence there are at most 2n + 1 non-
zero differences in E. By virtue of Lemma 4.2.3 these are bounded in absolute
value by maxi khi+1 − hi k∞ ≤ kf k∞ , which yields
√
2n + 1
|E| ≤ kf k∞ −→ 0.
n
(Lemma 4.2.3, part 2). Using Lemma 4.2.3, (4.2.8), we can bound the second
expectation from below by
n−1
1X
−2kf k∞ δi −→ 0 (n → ∞).
n i=0
(Lemma 4.2.3, part 2). Therefore, we need only show that the fourth expecta-
tion satisfies
n−1
1X
lim inf δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i , X̄i+1 ) ≥ 0. (4.2.13)
n→∞ n i=0
This will be done exploiting the periodicity of {b̃i }i . In order to prove (4.2.13)
we first assume that n = kN with k ∈ IN0 :
kN −1
1 X
δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i , X̄i+1 )
kN i=0
k−1 (j+1)N −1
1X 1 X
= δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i , X̄i+1 ) .
k j=0 N
i=jN
where the 1st, 3rd, etc. line after the equality (the non-indented terms) can be
bounded by (4.2.7), the 2nd, 4th, etc. term (the indented terms) by Lemma
4.2.3, part 2. Consequently,
and
kN −1
1 X
δi EhbicN +1 (a∗ , X̄bicN +1 ) − Ehi (b̃i, X̄i+1 )
kN i=0
k−1 (j+1)N −1 k−1 (j+1)N −1
3kf k∞ X 1 X 3kf k∞ X δjN X
≥− δi (i − jN ) ≥ − (i − jN )
k j=0
N k j=0
N
i=jN i=jN
k−1 N −1 k−1
! ! !
1X 1 X 1X N −1
= −3kf k∞ δjN i = −3kf k∞ δjN . (4.2.15)
k j=0 N i=0 k j=0 2
Set k := bn/N c and bound the first bracket in (4.2.16) from below by
k−1
!
bncN 3kf k∞ (N − 1) 1 X
− · δjN −→ 0 (n → ∞)
n 2 k j=0
(use (4.2.15) and note that n → ∞ implies k → ∞). The absolute value of the
second bracket is bounded from above by
n−1
1 X
δi khbicN +1 k∞ + khi k∞
n
i=bncN
n−1
1 X
≤ δbicN khbicN k∞ + δbicN khbicN +1 − hbicN k∞ + δi khi k∞
n
i=bncN
126 Chapter 4. A Markov model with transaction costs: probabilistic view
n−1
1 X n − bncN N −1
≤ 3kf k∞ = 3kf k∞ ≤ 3kf k∞ −→ 0 (n → ∞),
n n n
i=bncN
Proof. The argument requires some less known notions from analysis, espe-
cially Clarke’s generalized derivative for Lipschitz continuous functions (Clarke,
0 00
1981). Let W ⊆ IRd and Z ⊆ IRd be Banach spaces (whose supremum-
norms are both denoted by k · k∞ ). Given a Lipschitz continuous mapping
Φ : W × Z → IR, Clarke’s generalized derivative is defined as the convex hull
of a limit set,
n o
∂w Φ(w, z) := conv lim ∇wi Φ(wi, z) wi → w ,
i→∞
where only those sequences wi are considered for which all gradients ∇wi Φ(wi , z)
and the limit limi→∞ ∇wi Φ(wi , z) exist (note that the gradients exist almost
everywhere due to Rademacher’s Theorem). H(M1 , M2 ) denotes the Hausdorff
0
distance between two subsets M1 and M2 of IRd , defined by
H(M1 , M2) := max sup ρ(w2 , M1 ), sup ρ(w1 , M2 )
w2 ∈M2 w1 ∈M1
0 00
Proposition 4.3.2. (Ledyaev, 1984, Theorem 1) Let W ⊆ IRd and Z ⊆ IRd
be Banach spaces and Φ : W × Z −→ IR a mapping such that:
for all w ∈ W, z1 , z2 ∈ Z.
3. There exists a constant ∆ such that for all (w, z) ∈ W ×Z with Φ(w, z) > 0
Finally, we need the modulus of continuity, defined for any continuous function
g : S × [A, B]dm → IR as (x̄ ∈ [A, B]dm fixed)
ω g(·, x̄), := sup g(s, x̄) − g(t, x̄).
s,t∈S,ks−tk∞ ≤
Thus, having all the tools we need at hand, we can embark on the proof of
Proposition 4.3.1.
Let b = (b−1, ..., bm) ∈ S, x̄ = (x1, ..., xm) ∈ [A, B]dm . For fixed x̄ we set
Φ : S × S → IR : (b, s) 7→ Φ(b, s) := |gc (s, x̄, b)|, recall gc from (4.1.3). Clearly,
Φ is Lipschitz continuous in the first argument. Moreover,
|Φ(b, s) − Φ(b, t)| ≤ |gc (s, x̄, b) − gc (t, x̄, b)| ≤ c · const(r, B, m) · ks − tk∞
128 Chapter 4. A Markov model with transaction costs: probabilistic view
(to see this, note that the self-financing condition forces b−1 ≤ c). Taking the
gradient for gc (under Φ(b, s) > 0) yields
m
!
X
m
∂b Φ(b, s) ⊆ {1} × {0} × [0, c] · (1 + r)b0 + x k bk ,
k=1
so that
m
X
inf kvk∞ ≥ (1 + r)b0 + x k bk
v∈∂b Φ(b,s)
k=1
≥ min{1 + r, A}(1 − b−1 ) ≥ const(r, A)(1 − c).
The Lipschitz property of f yields ω f (·, x̄), ≤ const. · and from the latter
chain of inequalities we obtain
const.
ω hi (·, x̄), ≤ · .
δi
Thus δi · hi (·, x̄) is Lipschitz continuous with the same Lipschitz constant as f
(independent of x̄), and the proof is finished. 2
129
CHAPTER 5
m(b, x̄) := E[f (b, X̄i+1 )|X̄i = x̄] = E[f (b, X̄d+1)|X̄d = x̄],
f being the (logarithmic) utility function and S(s, x̄) being the set of admissible
portfolio vectors when x̄ denotes the last d observed return vectors and s the
last chosen portfolio. The Bellman equation was solved by a value iteration
type of algorithm in Chapter 4, which crucially relies on the distribution of
the stationary process {X̄i }i. This distribution, in general, is unknown to the
investor. Nonetheless he may try to estimate a solution of the Bellman equation.
A natural way to obtain an estimated solution is to replace the conditional
expectations E[hi(b, X̄d+1 )|X̄d = x̄] and m(b, x̄) in (5.1.1) by kernel estimates
i−1
X K (x̄ − X̄ j )/w i
hi (b, X̄j+1 ) · i−1
P
j=−d+1 K (x̄ − X̄k )/wi
k=−d+1
and
i−1
X K (x̄ − X̄j )/wi
f (b, X̄j+1 ) · i−1
,
P
j=−d+1 K (x̄ − X̄k )/wi
k=−d+1
M̂i operates on C(S × [A, B]dm )/{constant functions} according to M̂i [h] :=
[M̂i h]. Now, arguing similarly as in the proof of Proposition 4.2.2, we find that
M̂i is a contraction mapping in the norm kgk := max g − min g. Indeed, for
functions g, h ∈ C(S × [A, B]dm ) there exists a b∗ ∈ S(s, x̄) with
i−1
X
f (b∗ , X̄j+1 ) + (1 − δi )h(b∗ , X̄j+1 ) Ki (X̄j , x̄),
(M̂i h)(s, x̄) =
j=−d+1
i−1
X
f (b∗ , X̄j+1 ) + (1 − δi )g(b∗ , X̄j+1 ) Ki(X̄j , x̄).
(M̂i g)(s, x̄) ≥
j=−d+1
132 Chapter 5. A Markov model with transaction costs: statistical view
Starting from here, one can argue exactly as in the proof of Proposition 4.2.2
to obtain kM̂i h − M̂i gk ≤ (1 − δi )kh − gk. Hence there exists a solution ĥi , λ̂i
of the empirical Bellman equation. Due to (5.1.3) this can be normalized to
i−1
X
ĥi (s, x̄) = max f (b, X̄j+1 ) + (1 − δi )ĥi (b, X̄j+1 ) Ki(X̄j , x̄) .
b∈S(s,x̄)
j=−d+1
(5.1.4)
In the sequel, ĥi denotes a solution of (5.1.4).
We are now in a position to define the strategy which will turn out to have the
same optimality properties as the strategy in Chapter 4. On the analogy of
(4.2.3) the investor follows the strategy
with
i−1
X
V̂i(b, x̄) := f (b, X̄j+1 ) + (1 − δi )ĥi (b, X̄j+1 ) Ki (X̄j , x̄).
j=−d+1
Observe that, in contrast to b∗i from Chapter 4, this strategy can be constructed
using observed data only, we need not know the underlying distribution of the
return process.
The important feature is: This strategy is still optimal if the kernel estimates
work sufficiently well. A sufficient condition is given in the following theorem.
Then
i
X
1
lim 2 sup E sup g(b, X̄j+1 )Ki+1 (X̄j , x̄) (5.1.6)
i→∞ δi+1 g∈G dm
b∈S,x̄∈[A,B]
j=−d+1
−E[g(b, X̄i+2)|X̄i+1 = x̄] = 0
implies that
n−1 n−1
!
1X 1X
lim inf Ef (b̂i , X̄i+1 ) − Ef (bi , X̄i+1 ) ≥0 (5.1.7)
n→∞ n i=0 n i=0
satisfy
α(k) ≤ c · ρk (k ≥ 1).
The behaviour of the α-mixing coefficients α(k) mirrors how fast dependency
in the process variables decays for large time lags k. Under mild assumptions,
the class of GSM-processes comprises linear processes (Bosq, 1996, Sec. 1.3,
2.3), polynomial AR-processes (Doukhan, 1994, Sec. 2.4.1, Th. 5) such as
ARMA-processes (Doukhan, 1994, Sec. 2.4.1.2, Th. 6, Cor. 3), and Doeblin-
or Harris-recurrent Markov chains (Doukhan, 1994, Sec. 2.4, Th. 1 and 3).
Theorem 5.1.2. The assumptions of Theorem 5.1.1 are fulfilled if the following
hold:
134 Chapter 5. A Markov model with transaction costs: statistical view
1. {Xi }∞ m
i=−d+1 is a stationary [A, B] -valued d-stage Markov process (cf. V1-
V3 in Section 4.1) and geometrically strongly mixing.
2. There exist densities fX̄0 and fX0 |X̄0 of the distributions PX̄0 and PX0 |X̄0 ,
respectively, such that
|fX̄0 (x̄) − fX̄0 (ȳ)| ≤ Ckx̄ − ȳk∞ for all x̄, ȳ ∈ [A, B]dm ,
for some k > 0, H denoting the Hausdorff distance (cf. Section 4.3),
– fX0 |X̄0 is Lipschitz continuous such that for some C > 0,
|fX0 |X̄0 (x, x̄) − fX0 |X̄0 (x, ȳ)| ≤ Ckx̄ − ȳk∞
Hence, under the (not too restrictive) conditions of Theorem 5.1.2 we are able
to construct an admissible strategy {b̂i }i that is superior to any other admissible
strategy {bi}i based on a portfolio selection function in the sense of
n−1 n−1
!
1X 1X
lim inf Ef (b̂i , X̄i+1 ) − Ef (bi , X̄i+1 ) ≥ 0.
n→∞ n i=0 n i=0
It should be stressed again that this is a conservative, i.e. worst case analysis,
the lim inf giving the worst possible performance of our strategy {b̂i}i .
Remark. There are a number of sufficient conditions for {Xi }i to be a station-
ary GSM-process (see, e.g., Doukhan, 1994). For example, the GSM property
holds if there exists a continuous function r : [A, B]m → IR+
0 with
To verify this, one may use the Doeblin condition in Theorem 1 of Doukhan
(1994, Sec. 2.4): The transition probabilities of the d-stage Markov process
{Xi }i are Z
P(x̄0, C) = fX0 |X̄0 (x0|x̄0 )dx0
C
for x̄0 ∈ [A, B]dm , C ∈ B([A, B]m ). In particular, with the measure µ := r · λ (λ
denoting the Lebesgue-Borel-measure on [A, B]m ),
Z
P(x̄0 , C) ≥ r(x0)dx0 = µ(C)
C
and Z
m
µ([A, B] ) = r(x0 )dx0 ∈ (0, 1].
[A,B]m
The first will be estimated uniformly in b ∈ S and x ∈ IRd by the kernel estimate
n−1
1 X x − Xi
Zn (g, b, x) := g(b, Xi+1)K , (5.2.1)
nwnd i=0 wn
the latter by
Zn (g, b, x)
Rn (g, b, x) := .
Zn (1, b, x)
K : IRd −→ IR+ 0R is assumed to be a fixed bounded, Lipschitz continuous kernel
function with IRd K(x)dx = 1 and IRd K(x)kxk∞dx < ∞. wn ∈ IR+ is a
R
d
Theorem 5.2.1. Let {Xi }∞ i=0 be a stationary GSM-process of IR -valued ran-
dom variables with a Lipschitz continuous density fX0 . Moreover, let G be a
0
class of functions g : S × IRd −→ IR (S ⊆ IRd compact) with the follow-
ing property: There exists a constant C > 0 such that for all g ∈ G and all
5.2 Uniformly consistent regression estimation 137
a, b ∈ S, x, y ∈ IRd
kgk∞ ≤ C, (5.2.2)
Z
|g(a, x)|dx ≤ C, (5.2.3)
IRd
|g(a, x) − g(b, x)| ≤ C · ka − bk∞, (5.2.4)
|E[g(b, X1 )|X0 = x] − E[g(b, X1)|X0 = y]| ≤ C · kx − yk∞ . (5.2.5)
Now, if the support of fX0 , supp fX0 = {x : fX0 (x) > 0}, is a compact subset
of IRd , the theorem may be used to derive the following result concerning the
estimation of the regression function R. Recall that H denotes the Hausdorff
distance (cf. Section 4.3).
Corollary 5.2.2. Under the assumptions of Theorem 5.2.1, if fX0 has compact
support and the level sets
1
Xn := x : fX0 (x) ≥ (5.2.6)
n
satisfy
const.
H (suppfX0 , Xn) ≤ (0 < k ≤ ∞), (5.2.7)
nk
then
!
logβ n
sup E sup |Rn (g, b, x) − R(g, b, x)| = O
g∈G x∈suppfX0 ,b∈S nk/((d+2)(k+1))
rate !
logβ n
sup E sup |Rn (g, b, x) − R(g, b, x)| = O
g∈G x∈suppfX0 ,b∈S n1/(d+2)
for any β > 1 (for the optimality of rates see, e.g., Györfi et al., 2002).
The proofs of Theorems 5.2.1 and Corollary 5.2.2 refine arguments used in the
proofs of Theorems 2.2 and 3.2 in Bosq (1996). We first give the
Proof of Theorem 5.2.1. We write supx,b instead of supx∈Xn ,b∈S .The estima-
tion error can be decomposed into stochastic and deterministic error,
Note that the constant only depends on K, fX0 and on C in (5.2.2) - (5.2.5),
not however on the specific g ∈ G under consideration.
2nd step: Analysis of the deterministic error sup |Zn − EZn|.
We first cover Xn by dνn ed (νn ≥ 1) cubes
diam(Xn)
C(j, n) := x kx − xj,nk∞ ≤
2 · dνn e
with side lengths diam(Xn)/dνn e and centres xj,n (j = 1, ..., dνned ). Analogously,
0
S is covered by dνn ed cubes
diam(S)
S(k, n) := b ∈ S kb − bk,n k∞ ≤
2 · dνn e
0
(k = 1, ..., dνned ). From this,
sup |Zn (g, b, x) − EZn (g, b, x)| = sup sup |Zn (g, b, x) − EZn (g, b, x)|
x,b j,k x∈C(j,n),b∈S(k,n)
≤ sup |Zn (g, bk,n , xj,n) − EZn (g, bk,n , xj,n)|
j,k
+ sup sup |Zn (g, b, x) − Zn (g, bk,n , xj,n)|
j,k x∈C(j,n),b∈S(k,n)
+ sup sup |EZn (g, b, x) − EZn (g, bk,n, xj,n )|.
j,k x∈C(j,n),b∈S(k,n)
and hence
sup |Zn (g, b, x) − EZn(g, b, x)|
x,b
≤ sup |Zn (g, bk,n , xj,n) − EZn (g, bk,n , xj,n)|
j,k
max(diam(Xn), diam(S))
+2 · const. · .
wnd+1 νn
This result yields (0 < rn → ∞ will represent the desired rate of convergence
at a later stage)
rn · E sup |Zn (g, b, x) − EZn (g, b, x)|
x,b
≤ E rn · sup |Zn (g, bk,n , xj,n) − EZn (g, bk,n , xj,n)|
j,k
max(diam(Xn), diam(S))
+2 · const. · rn ·
wnd+1 νn
d
2kgk∞ kKk
Z ∞ rn /wn
= P sup |Zn (g, bk,n , xj,n) − EZn (g, bk,n , xj,n)| > d
j,k rn
0
max(diam(Xn), diam(S))
+2 · const. · rn ·
wnd+1 νn
d
2kgk∞ kKk r /w
Z ∞n n
X
≤ µ+ P |Zn (g, bk,n , xj,n) − EZn (g, bk,n , xj,n)| > d
rn
j,k µ
max(diam(Xn), diam(S))
+2 · const. · rn · , (5.2.10)
wnd+1 νn
where µ > 0 is arbitrary.
3rd step: Combining the results of step 1 and step 2.
Assume for the moment that
P |Zn (g, bk,n, xj,n ) − EZn(g, bk,n , xj,n)| >
rn
(1) (2)
has an upper bound pn () + pn () independent of g, b and x with the following
four properties:
1. We have
Z∞
p(1)
n ()d < ∞. (5.2.11)
µ
5.2 Uniformly consistent regression estimation 141
3. For all µ > 0 there exists a N (µ) ∈ IN such that for all ≥ µ we have
that
0
(νn + 1)d+d p(1)
n () is monotonic decreasing for n ≥ N (µ). (5.2.13)
4. We have
d
2kgk∞ kKk
Z ∞ rn/wn
0
lim (νn + 1)d+d p(2)
n ()d = 0. (5.2.14)
n→∞
µ
Z∞
d+d0
≤ µ + lim sup (νn + 1) p(1)
n ()d
n→∞
µ
d
2kgk∞ kKk
Z ∞ rn /wn
0
+ lim sup (νn + 1)d+d p(2)
n ()d
n→∞
µ
max(diam(Xn ), diam(S))
+ lim sup const. · rn · wn + .
n→∞ wnd+1 νn
The second term is zero using (5.2.11)-(5.2.13) and the monotone convergence
theorem, the third term is zero because of (5.2.14). µ being arbitrary we obtain
Proposition 5.2.3. (Bosq, 1996, Theorem 1.3) Let {Yi}∞ i=−∞ be a centered
real-valued stochastic process with sup1≤i≤n kYik∞ ≤ D. Then for any q ∈
[1, n/2] and any > 0
n−1
! 2 1/2
1 X q 4D n
P Yi > ≤ 4 exp − 2 + 22 1 + dqeα ,
n 8v 2q
i=0
where
n
p :=
2q
2 D
2
v := 2 σ(q)2 +
p 2
2
σ (q) := max E bjpc + 1 − jp Ybjpc+1 + Ybjpc+2 + ...
0≤j≤2dqe−1
2
+Yb(j+1)pc + (j + 1)p − b(j + 1)pc Yb(j+1)p+1c .
where the constant depends on nothing but K, fX0 and C from (5.2.2)-(5.2.3).
Hence (set D := C · kKk∞wn−d )
P |Zn (g, b, x) − EZn (g, b, x)| > ≤ p(1) (2)
n () + pn ()
rn
with (const. being another suitable constant)
2 qn wnd
p(1)
n () := 4 exp − ,
const. · (1 + )rn2
1/2
(2) const. · rn n
pn () := 22 1 + dqn eα −1 .
wnd 2qn
qn wnd 0
∈F and νnd+d ∈ F (5.2.17)
rn2
(observe that {an }, {bn} ∈ F and 0 ≤ ρ < 1 implies an ρbn & 0). As to (5.2.14)
we use
d
2kgk∞ kKk
Z ∞ rn /wn
0
(νn + 1)d+d p(2)
n ()d
µ
d
2kgk∞ kKk
Z ∞ rn /wn 1/2
d+d 0 n const. · rn
≤ 22 · (νn + 1) · dqn eα −1 1+ d
2qn wnd
0
0 n rn
≤ const. · (νn + 1)d+d · dqn eα − 1 · d.
2qn wn
Thus (5.2.14) is satisfied if only
0
rn νnd+d
n
lim · dqn eα − 1 = 0. (5.2.18)
n→∞ wnd 2qn
So it suffices to satisfy (5.2.17) and (5.2.18). This is done by the choice
1
rn := (β > 1)
wn logβ n
144 Chapter 5. A Markov model with transaction costs: statistical view
diam(S) + diam(Xn )
νn :=
wnd+2
1
wn := 1/(d+2)
n
n
qn := (2β − 1 > a > 1).
loga n
Indeed, from qn wnd /rn2 = log2β−a n ∈ F and diam(Xn) ∈ F we have (5.2.17).
Moreover,
0
rnνnd+d
n
· dqn eα −1
wnd 2qn
1+d+d0 +(1+d)/(2+d)
a
d+d0 n log n
≤ const. · (diam(S) + diam(Xn)) · ·α −1 ,
loga+β n 2
and the GSM-property yields (5.2.18) (observe again that {an}, {bn} ∈ F and
0 ≤ ρ < 1 implies an ρbn & 0).
Finally, (5.2.15) now reads
1
lim sup sup rn · E sup |Zn (g, b, x) − φ(g, b, x)| ≤ lim sup const. · = 0. 2
n→∞ g∈G x,b n→∞ logβ n
Using (5.2.2)-(5.2.5), we can bound the second and the third term from above
by
const. · sup ∗inf 0 kx − x∗ k∞ ≤ const. · n−kγ .
x∈X x ∈Xn
5.3 Proving the optimality of the strategy 145
the latter equality for the balanced choice γ = 1/((d + 2)(k + 1)). 2
for any admissible portfolio strategy {bi }i. Equality holds for {b̂i }i , which yields
with
i
(1)
X
Hi (b) := f (b, X̄j+1 )Ki+1 (X̄j , X̄i+1 ) − m(b, X̄i+1),
j=−d+1
and
i
X
(2)
Hi (b) := (1−δi+1 ) ĥi+1 (b, X̄j+1)Ki+1 (X̄j , X̄i+1 )−E[ĥi+1 (b, X̄i+2 )|Fi+1 ] .
j=−d+1
(1) (2)
Now, we investigate the asymptotics of the terms Hi and Hi in (5.3.1).
Clearly,
X i
(1)
sup |Hi (b)| ≤ sup
f (b, X̄j+1 )Ki+1(X̄j , x̄) − m(b, x̄) . (5.3.2)
b∈S b∈S,x̄∈[A,B]dm j=−d+1
(2)
To analyse Hi we define a Fi -measurable random variable ci ,
and obtain
(2)
sup |Hi (b)|
b∈S
i
X
≤ sup
hi+1 (b, X̄j+1 )Ki+1 (X̄j , X̄i+1 ) − E[hi+1 (b, X̄i+2)|Fi+1 ]
b∈S
j=−d+1
i
X
+ (ĥi+1 (...) − hi+1 (...) + ci+1 )Ki+1 (...)
j=−d+1
−E[ĥi+1 (...) − hi+1 (...) + ci+1 |Fi+1 ]
i
X
≤ sup
hi+1 (b, X̄j+1 )Ki+1(X̄j , x̄)
b∈S,x̄∈[A,B]dm j=−d+1
−E[hi+1 (b, X̄i+2 )|X̄i+1 = x̄] + 2kĥi+1 − hi+1 + ci+1 k∞
i
X
≤ sup
hi+1 (b, X̄j+1 )Ki+1(X̄j , x̄)
dm
b∈S,x̄∈[A,B] j=−d+1
5.3 Proving the optimality of the strategy 147
−E[hi+1 (b, X̄i+2 )|X̄i+1 = x̄] + kĥi+1 − hi+1 k. (5.3.3)
For this, recall the norm k · k = 2 inf c∈IR k · +ck∞ on C(S × [A, B]dm ). By the
contraction property of M̂i ,
and hence
2
kĥi − hi k ≤ kM̂i hi − Mi hi k∞ . (5.3.4)
δi
It is easily established that
kM̂i hi − Mi hi k∞ (5.3.5)
i−1
X
≤ sup
f (b, X̄j+1 )Ki (X̄j , x̄) − m(b, x̄)
b∈S,x̄∈[A,B]dm j=−d+1
i−1
X
+(1 − δi ) sup
hi (b, X̄j+1 )Ki(X̄j , x̄) − E[hi(b, X̄d+1 )|X̄d = x̄] .
b∈S,x̄∈[A,B]dm j=−d+1
{b̃i }i being the periodic strategy from Lemma 4.2.6. This is the analogue of
(4.2.12) we were looking for.
Finally, we observe that
ĥi+1 (bi+1 , X̄i+2 ) − ĥi+1 (bi , X̄i+1 ) = hi+1 (bi+1 , X̄i+2 ) − hi+1 (bi, X̄i+1 )
+ ĥi+1 (bi+1 , X̄i+2 ) − hi+1 (bi+1 , X̄i+2 ) + ci+1
+ hi+1 (bi , X̄i+1 ) − ĥi+1 (bi , X̄i+1 ) − ci+1 , (5.3.7)
where the expectation of the absolute value of the sum of the last two brackets
is bounded from above by
using (5.3.4), (5.3.5) and the assumption (5.1.6) of the theorem. (5.3.7) and
(5.3.8) show that for the purpose of the asymptotical inference in (5.3.6), we
can replace ĥi+1 by hi+1 in the definition of Di . We can then argue exactly in
the same way as in the proof of Theorem 4.2.1 (starting from (4.2.12)) to obtain
the optimality relation
n−1 n−1
!
1X 1X
lim inf E m(b̂i, X̄i ) − E m(bi, X̄i ) ≥ 0. 2
n→∞ n i=0 n i=0
5.3 Proving the optimality of the strategy 149
CHAPTER 6
E (log < b(X−d , ..., X−1), X0 >) ≥ E (log < f (X−d , ..., X−1), X0 >) (6.1.1)
for all measurable f : [a, b]dm −→ S. At time n ∈ IN0 , b advises the in-
vestor to allocate his wealth to the single shares according to the portfolio
b(Xn−d+1 , ..., Xn) = b(X̄n+1), where X̄n is a shorthand notation for the d-past
(Xn−d , ..., Xn−1) ∈ IRdm
+ of Xn .
– It is plausible that one should drop observations from the far-away past if
the stationarity of the market is not clear. Outdated observations (under
non-stationarity they appear to be drawn from a “wrong” distribution)
may endanger the performance of the conditional log-optimal portfolio.
We conjecture that portfolio selection functions are less sensitive to devi-
ations from stationarity of the return process. Finding empirical evidence
or counterevidence for this, however, is beyond the scope of this thesis.
for fixed x̄. Here and in the sequel, the quantities s and x̄ are to be implicitly
understood as s ∈ S and x̄ ∈ IRdm+ .
Let KT (x̄) ⊆ S denote the set of Kuhn-Tucker-points (cf. Foulds, 1981), i.e.
the set of solutions of the convex maximization problem
Because of continuity of R(·, x̄) and because of S being a compact set, KT (x̄) 6=
∅, and the existence of solutions to (6.1.2) as well as to (6.1.1) is guaranteed.
e.g. Ding et al., 1993; Peters, 1997) are not precluded from consideration as
they would have been under mixing conditions.
In this chapter, the estimator of Yakowitz, Györfi et al. (1999) is combined
with a stochastic projection algorithm (Kushner and Clark, 1978) to obtain
a strongly consistent sequential estimator for a log-optimal portfolio selection
function of the d-past of the return process. The mixing conditions in Walk
and Yakowitz (1999) and Walk (2000) are replaced by a Lipschitz condition on
the gradient of the target function.
V1: {Xi }∞ m
i=−T (T ∈ IN) is an [a, b] -valued stationary ergodic stochastic process
on a probability space (Ω, A, P) (0 < a ≤ b < ∞ need not be known
explicitly). Some d ∈ IN (d ≤ T ) is fixed.
La
|fX0 |X̄0 (x0, x̄) − fX0 |X̄0 (x0, ȳ)| ≤ √ |x̄ − ȳ|
m d · b µ([a, b]m)
(note the similarity to the Lipschitz conditions of Theorem 5.1.2). In particular,
this holds if fX0 |X̄0 is continuously differentiable. Hence, V2 is a condition on
156 Chapter 6. Portfolio selection functions in stationary return processes
the variability of the return vectors and as such a condition on the risk inherent
in the market.
At time n, the investor’s task is to produce an estimate ZnL (x̄) of the value of a
log-optimal portfolio selection function given the last d observed return vectors
are x̄ ∈ IRdm . This can be done by the following projection algorithm:
using
n n
Xj · 1Ak (x̄)(X̄j )
X X
M̂k,n (s, x̄) := 1Ak (x̄)(X̄j ) , (6.2.1)
< s, Xj >
j=−M j=−M
ˆ k,n,L (s, x̄) := TL2−k (M̂k,n (s, x̄) − M̂k−1,n (s, x̄)).
∆
Here, for x ∈ IRm , Π(x) denotes the best approximating (in the Euclidean
norm) element of x in the simplex S, i.e. the projection of x on S. To
start the iteration at time n = 0, we use an arbitrary starting estimate
L
Z−1 (x̄) ∈ S.
Lemma 6.2.1. Let ρ ZnL(x̄), KT (x̄) := inf y∈KT (x̄) kZnL (x̄) − yk denote the
Euclidean distance of ZnL (x̄) from the set KT (x̄). Then, under the assumptions
1. V1 and V2,
P∞
2. αn −→ 0 (n → ∞) and n=0 αn = ∞,
To formulate this result more neatly, let ZnL∗(x̄) denote the best approximating
(in the Euclidean metric) element of ZnL(x̄) in KT (x̄) (observe that KT (x̄) is
compact). Note that ZnL∗ (x̄) is a log-optimal portfolio selection function as x̄
varies. Then Lemma 6.2.1 can be rephrased more explicitly as
and
R(ZnL (x̄), x̄) −→ R∗ (x̄) (n → ∞) P − a.s.. (6.2.3)
158 Chapter 6. Portfolio selection functions in stationary return processes
2. Strong L1 -consistency:
Z
|ZnL (x̄) − ZnL∗ (x̄)|PX̄0 (dx̄) −→ 0 (n → ∞) P − a.s.
and
Z
R(ZnL (x̄), x̄)PX̄0 (dx̄) −→ Rmax (n → ∞) P − a.s., (6.2.4)
The limit relations (6.2.3) and (6.2.4) are the central results for the proposed
estimation procedure. They demonstrate that in the long run ZnL (x̄) almost
surely achieves the optimal expected growth of wealth among all strategies
based on portfolio selection functions of the d-past.
Remark concerning Lemma 6.2.1 and Theorem 6.2.2. The limit relations
in Lemma 6.2.1 and Theorem 6.2.2, part 1 are true even in the stronger sense
that a fixed exceptional null set of ω ∈ Ω and a fixed exceptional null set of
x̄ ∈ [a, b]dm exist, outside which for all ω and x̄
ρ ZnL(x̄), KT (x̄) −→ 0,
Corollary 6.2.3. Suppose the support of the distribution PX0 is not confined
to a hyperplane in IRm containing the diagonal {(d, ..., d)T ∈ IRm |d ∈ IR}. For
any measurable portfolio selection function f : IRdm+ −→ S with accumulated
returns
n−1
Y
Vn := < f (X̄i+1 ), Xi+1 >
i=0
we have
1 Vn
lim sup log ≤ 0 P − a.s..
n→∞ n Rn
For the regression estimator of Yakowitz, Györfi et al. (1999) it is not yet known
whether there exists an adaptive rule for the choice of the Lipschitz constant L
generating a procedure that is strongly consistent for arbitrary stationary er-
godic processes with Lipschitz continuous regression function. Theorem 6.2.4 is
remarkable because it asserts that for the application of the regression estimate
to the portfolio optimization problem such adaptation can be achieved.
We finish this section with two remarks about extensions of the stated results.
Remark concerning (6.2.1). Lemma 6.2.1, the first part of Theorem 6.2.2
and Theorem 6.2.4 still hold if we use kernel estimates
n X n
X X j X̄ j − x̄ X̄ j − x̄
M̂k,n (s, x̄) := K K
< s, Xj > hk hk
j=−M j=−M
(for sufficiently large n), then Theorem 6.2.4 remains valid for this Zn .
6.3 Checking the properties of the estimation algorithm 161
with
n Xn
X Xj
M̂k,n (s, x̄) := 1A (x̄) (X̄j ) 1Ak (x̄)(X̄j ) (6.3.3)
< s, Xj > k
j=−M j=−M
ˆ k,n,L (s, x̄) := TL2−k (M̂k,n (s, x̄) − M̂k−1,n (s, x̄))
∆ (6.3.4)
for PX̄0 -a.a. x̄, where ∆k (s, x̄) := Mk (s, x̄) − Mk−1 (s, x̄) (Yakowitz, Györfi et
al., 1999, eq. (4)). In addition, a truncated version of the expansion is defined
by
X∞
mL(s, x̄) := M1 (s, x̄) + ∆k,L (s, x̄)
k=2
The convergence of the components (6.3.3) and (6.3.4) in the definition of the
estimator to the corresponding components in expansion (6.3.5) is given by the
following
Lemma 6.3.1. Under the assumption V1 one has for PX̄0 -a.a. x̄ and any fixed
s∈S
1. M̂1,n (s, x̄) − M1 (s, x̄) −→ 0 (n → ∞) P − a.s.,
2. ˆ k,n,L (s, x̄) − ∆k,L (s, x̄) −→ 0 (n → ∞) P − a.s..
∆
in particular the first part of the lemma. Since the truncation operator is itself
Lipschitz continuous with Lipschitz constant 1, the second part of the lemma
is obtained by
ˆ
∆k,n,L (s, x̄) − ∆k,L (s, x̄)
≤ TL2−k (M̂k,n (s, x̄) − M̂k−1,n (s, x̄)) − TL2−k (Mk (s, x̄) − Mk−1 (s, x̄))
≤ M̂k,n (s, x̄) − Mk (s, x̄) + Mk−1 (s, x̄) − M̂k−1,n (s, x̄)
−→ 0 P − a.s. (n → ∞).
2
In Yakowitz, Györfi et al. (1999) a lemma analogous to the above one is used to
obtain pointwise strong consistency of m̂n,L(s, x̄). However, the proof of Lemma
6.2.1 requires convergence to hold uniformly in S, which will be derived from
the following lemma.
Lemma 6.3.2. Let S ⊆ IRd be some compact set, K > 0 and (fn )n∈IN a class
of functions fn : S −→ IRd with |fn (s) − fn (t)| ≤ K|s − t| for all s, t ∈ S.
Then limn→∞ fn (s) = 0 for all s ∈ S implies that limn→∞ sups∈S |fn (s)| = 0.
Proof. Let δ > 0 be arbitrary but fixed. Choose a finite δ-net N in S. Then
for all s ∈ S there exists some ts ∈ N with |s − ts | ≤ δ. This yields
|fn (s)| ≤ |fn (s) − fn (ts )| + |fn (ts)| ≤ K|s − ts| + |fn (ts )| ≤ K · δ + |fn (ts )|
164 Chapter 6. Portfolio selection functions in stationary return processes
and
sup |fn (s)| ≤ K · δ + sup |fn (t)|.
s∈S t∈N
lim sup sup |fn (s)| ≤ K · δ + lim sup sup |fn (t)| = K · δ + 0 = K · δ.
n→∞ s∈S n→∞ t∈N
This is a consequence of Lemma 6.3.2 with fn (s) := M̂1,n (s, x̄) − M1 (s, x̄).
According to Lemma 6.3.1 limn→∞ fn (s) = 0 P-a.s. and |fn (s) − fn (t)| ≤
|M̂1,n (s, x̄) − M̂1,n (t, x̄)| + |M1 (s, x̄) − M1 (t, x̄)| hold for any s ∈ S and for PX̄0 -
a.a. x̄. To bound the terms in the latter expression, one uses
holds.
PR ˆ
Here, put fn (s) := k=2 ∆k,n,L (s, x̄) − ∆ k,L (s, x̄) . For any fixed s ∈ S and
for PX̄0 -a.a. x̄ one has limn→∞ ∆ ˆ k,n,L (s, x̄) − ∆k,L (s, x̄) = 0 P-a.s. according
to Lemma 6.3.1 and hence limn→∞ fn (s) = 0 P-a.s. for all s ∈ S. Moreover,
R R
ˆ k,n,L (s, x̄) − ∆
ˆ k,n,L (t, x̄) +
X X
|fn (s) − fn (t)| ≤ ∆ ∆k,L (s, x̄) − ∆k,L (t, x̄).
k=2 k=2
mb2
≤ 2 |s − t|
a2
(with (6.3.7)), and we obtain
mb2
|fn (s) − fn (t)| ≤ 2(R − 1) |s − t|
a2
R
X R
X
+ |Mk (s, x̄) − Mk (t, x̄)| + |Mk−1 (s, x̄) − Mk−1 (t, x̄)|
k=2 k=2
Let R ∈ {2, 3, ...} be arbitrary. For sufficiently large n we have Nn > R and as
in Yakowitz, Györfi et al. (1999), proof of Theorem 1, we obtain
N
X n ∞
ˆ k,n,L (s, x̄) −
X
∆ ∆k,L (s, x̄)
k=2 k=2
R Nn ∞
X
ˆ k,n,L (s, x̄) − ∆k,L (s, x̄) + ˆ k,n,L (s, x̄)| +
X X
≤ ∆ |∆ |∆k,L (s, x̄)|
k=2 k=R+1 k=R+1
Hence
N
X n X∞
sup
ˆ k,n,L (s, x̄) −
∆ ∆k,L (s, x̄)
s∈S
k=2 k=2
X R
≤ sup
ˆ
∆k,n,L (s, x̄) − ∆k,L (s, x̄) + 2−(R−1) · L.
s∈S
k=2
ZnL(x̄) = Π Zn−1
L L
(x̄) + αn m(Zn−1 , x̄) + αn βn (x̄)
with
L L
βn (x̄) := m̂n,L(Zn−1 (x̄), x̄) − m(Zn−1 (x̄), x̄), (6.3.11)
the projection algorithm (6.3.10) is a special case of the projection algorithm
Wn = Π (Wn−1 + αn (m(Wn−1 ) + ξn + βn ))
in Kushner and Clark (1978, eq. 5.3.1) for ξn := 0. Their Theorem 5.3.1
adapted to the case ξn := 0 reads
1. m(·, ·) is continuous,
P∞
2. αn > 0 with αn −→ 0 for n → ∞ and n=0 αn = ∞,
To this end, we first note that as in Yakowitz, Györfi et al. (1999, Corollary 1)
assumption V2, i.e., the existence of a constant L with
L
|m(s, x̄) − m(s, ȳ)| ≤ √ |x − y| for all x̄, ȳ ∈ IRdm
+ , s ∈ S,
md
implies
m(s, x̄) = mL(s, x̄).
From part 1, the Lebesgue dominated convergence theorem yields the assertions
of the second part of the theorem. The limit relation (6.3.12), now valid P-a.s.
for PX̄0 -a.a. x̄, and
√
|ZnL (x̄) − ZnL∗ (x̄)| ≤ max |s − t| ≤ m
s,t∈S
imply Z
|ZnL (x̄) − ZnL∗(x̄)|PX̄0 (dx̄) −→ 0 (n → ∞)
exists (Algoet and Cover, 1988, p. 877, corrected in Österreicher and Vajda,
1993, and Vajda and Österreicher, 1994). The accumulated return using the
170 Chapter 6. Portfolio selection functions in stationary return processes
It follows that
1 Vn 1 Vn 1 R∗
lim sup log ≤ lim sup log ∗ + lim sup log n . (6.3.13)
n→∞ n Rn n→∞ n Rn n→∞ n Rn
For the first term on the right hand side,the ergodic theorem and the optimality
of w imply
1 Vn
lim sup log ∗
n→∞ n Rn
n−1
1 X < f (X̄i+1 ), Xi+1 >
= lim sup log
n→∞ n i=0
< w(X̄i+1 ), Xi+1 >
< f (X̄0 ), X0 >
= E log
< w(X̄0 ), X0 >
Z
= E[log < f (x̄), X0 > |X̄0 = x̄] − E[log < w(x̄), X0 > |X̄0 = x̄] PX̄0 (dx̄)
≤ 0. (6.3.14)
Arguing along the lines of Walk (2000, Corollary 1), the second term has limiting
behaviour
1 R∗
lim sup log n
n→∞ n Rn
n−1
1X
log < w(X̄i+1), Xi+1 > − log < ZiL(X̄i+1 ), Xi+1 >
= lim sup
n→∞ n
i=0
= 0. (6.3.15)
This is seen as follows: (6.3.12) combined with Egorov’s theorem shows that
for each > 0 we can find sets Ω̃ ⊆ Ω and I˜ ⊆ [a, b]dm such that P(Ω̃) ≥ 1 − ,
˜ ≥ 1 − and
PX̄0 (I)
ZnL → w ˜
uniformly on Ω̃ × I. (6.3.16)
Then
n−1
1 X
L
log < w(X̄i+1), Xi+1 > − log < Zi (X̄i+1 ), Xi+1 >
n
i=0
6.3 Checking the properties of the estimation algorithm 171
n−1
1 X
log < w(X̄i+1 ), Xi+1 > − log < ZiL (X̄i+1 ), Xi+1 > · 1I˜(X̄i+1 )
≤
n i=0
n−1
1 X
log < w(X̄i+1 ), Xi+1 > − log < ZiL (X̄i+1 ), Xi+1 > · 1I˜c (X̄i+1 )
+
n i=0
n−1 n−1
c X cX
w(X̄i+1 ) − ZiL(X̄i+1 ) · 1I˜(X̄i+1 ) +
≤ 1 ˜c (X̄i+1 )
n i=0 n i=0 I
for some constant c > 0. The first term tends to zero by (6.3.16), the second
term to c · PX̄0 (I˜C ) ≤ c · . Now, let go to zero.
Finally, (6.3.14) and (6.3.15) plugged into (6.3.13) finish the proof. 2
Proof of Theorem 6.2.4. It suffices to prove Lemma 6.2.1 for Zn instead of
ZnL . Without loss of generality one can set M = 0. In the following we assume
n to be sufficiently large such that γn ≥ L. Because X̄0 takes on values in
a denumerable set X , for any > 0 there exists a finite subset X̄ ⊆ X with
P(X̄0 ∈ X̄ C ) ≤ . As in the proof of Corollary 6.2.3, w(·) denotes the essentially
unique log-optimal portfolio selection function.
For ω ∈ Ω, x̄ ∈ X , we consider the sequence
and show that for PX̄0 -a.a. x̄ a P-a.s. limit relation ZnLn (x̄, ω) −→ w(x̄) holds.
To establish this, we work through the following: Consider an accumulation
point fω (x̄) of the sequence, say
L
Zn0n0 (x̄, ω) −→ fω (x̄)
Indeed, (6.3.17) implies the existence of a set A ∈ A, P(A) = 1, such that for
any ω ∈ A
ZnLn (x̄, ω) −→ w(x̄) (6.3.18)
for PX̄0 -a.a. x̄. This is seen as follows: Assume we had an x̄ with PX̄0 ({x̄}) > 0
and P(A(x̄)) > 0, where A(x̄) := {ω|ZnLn (x̄, ω) 9 w(x̄)}. P(A(x̄)) > 0 implies
A(x̄) ∩ A 6= ∅, hence an ω ∈ A(x̄) ∩ A exists with ZnLn (x̄, ω) 9 w(x̄) on the one
172 Chapter 6. Portfolio selection functions in stationary return processes
hand (according to the construction of A(x̄)), and ZnLn (x̄, ω) → w(x̄) on the
other hand (according to (6.3.18) and PX̄0 ({x̄}) > 0). This is a contradiction.
Hence, for PX̄0 -a.a. x̄, we have P(A(x̄)) = 0, i.e.
ZnLn (x̄, ω) −→ w(x̄) P-a.s..
1 X n
≤ log < ZnLn (X̄i , ω), Xi > 1X¯ (X̄i )
n + 1 i=0
−E log < fω (X̄0 ), X0 > 1X¯ (X̄0 )
n
1 X
+ log < ZnLn (X̄i, ω), Xi > 1X̄ C (X̄i )
n + 1 i=0
+E log < fω (X̄0 ), X0 > 1X̄ C (X̄0 )
n
1 X
≤ log < ZnLn (X̄i , ω), Xi > − log < fω (X̄i ), Xi > 1X¯ (X̄i )
n + 1 i=0
1 X n
+ log < fω (X̄i ), Xi > 1X̄ (X̄i )
n + 1 i=0
−E log < fω (X̄0 ), X0 > 1X¯ (X̄0 )
1 X n
+c · 1X¯C (X̄i ) + P(X̄0 ∈ X̄ C ) (6.3.20)
n + 1 i=0
with a constant c = c(d, m, a, b) ∈ IR+ .
Because of the uniform convergence in (6.3.19) the first term of (6.3.20) satisfies
(for n sufficiently large)
log < Z Ln (X̄i, ω), Xi > − log < fω (X̄i), Xi > 1X̄ (X̄i)
n
6.3 Checking the properties of the estimation algorithm 173
for all i = 0, ..., n. Without loss of generality we may use the same constant c
as above.
For the second term it will be shown at the end of this proof that in P-a.a. ω
1 X n
lim sup log < fω (X̄i ), Xi > 1X̄ (X̄i)
n→∞ n + 1 i=0
−E log < fω (X̄0 ), X0 > 1X¯ (X̄0 ) = 0. (6.3.22)
For the third term the ergodic theorem yields that P-a.s.
n
1 X
lim 1X¯C (X̄i ) = P(X̄0 ∈ X̄ C ) ≤ . (6.3.23)
n→∞ n + 1
i=0
for P-a.a. ω.
On the other hand, using the definition of the random variable Ln , we obtain
P-a.s.
n
1 X
log < ZnLn (X̄i , ω), Xi > − log < w(X̄i), Xi >
n + 1 i=0
n
1 X
log < ZnL(X̄i , ω), Xi > − log < w(X̄i ), Xi > −→ 0, (6.3.25)
≥
n + 1 i=0
Because of the essential uniqueness of the optimum w(·), for P-a.a. ω we infer
(6.3.17), namely that fω (x̄) = w(x̄) PX̄0 − a.s..
So it only remains to demonstrate (6.3.22). To this end, let C := C [a, b]dm, S
be the space of continuous functions f : [a, b]dm −→ S, equipped with the supre-
mum norm supx̄∈[a,b]dm |f (x̄)|. For fω we can find a continuation f¯ω contained in
C, which coincides with fω on X̄ . Due to the separability of C (Megginson, 1998,
Sec. 1.12) a denumerable set G ⊆ C can be found, such that for any given > 0
and any fω there exists a function gω ∈ G satisfying supx̄∈X̄ |fω (x̄) − gω (x̄)| ≤ .
For given f : [a, b]dm −→ S use the shorthand notation Hn(ω, f ) for
n
1 X
log < f (X̄i ), Xi > 1X̄ (X̄i) − E log < f (X̄0 ), X0 > 1X̄ (X̄0) .
n + 1 i=0
Then
for P-a.a. ω. As to this, observe that (because of G being denumerable) the set
[
{ω|∃g ∈ G H(ω, g) > 0} = {ω|H(ω, g) > 0}
g∈G
is measurable. Using the ergodic theorem, the left hand side is a countable
union of null sets, i.e. null set itself. Hence for P-a.a. ω
Example 6.1: The market consists of a riskless bond with return of 2.6%
per market period and a share that follows a geometrical Brownian motion
(Luenberger, 1998, Sec. 11.7; Korn and Korn, 1999, Ch. 2) with a mean
return µ = 3% per market period and and a volatility σ = 15% per market
period. Investment starts after 5 market periods and ends after 50 market
periods. In this model, due to the independence of the share’s log-returns, the
log-optimal portfolio selection function coincides with the log-optimal portfolio,
which suggests to invest 67.86% of the current wealth in each market period into
1 3.000
0.9 3.061
0.8 3.100
0.7 3.116
0.6 3.109
0.5 3.081
0.4 3.030
0.3 2.956
0.2 2.861
0.1 2.742
0 2.600
0 5 10 15 20 25 30 35 40 45 50
0
0 5 10 15 20 25 30 35 40 45 50
the share (calculated by Cover’s algorithm, Theorem 1.3.2). Figures 6.1 and 6.2
show sample paths in the market together with estimation results. Throughout
this section we use the kernel variant of the projection algorithm (6.2.2) with a
cosine kernel K(x̄) = cos(min{kx̄kF /100, 1}) + 1 (the Frobenius norm kx̄kF of
x̄ = (xi,j )1≤i≤2,1≤j≤d being defined as the square root of the sum of the diagonal
elements of x̄T x̄) and L = 100.
Subgraphs a) of Figures 6.1 and 6.2 show the estimated log-optimal portfolio
weight for the share (solid line), i.e. the coordinate of ZnL(Xn+1−d , ..., Xn) that
corresponds to the share. The results can be compared with the true log-
optimal portfolio (dashed line) and the expected portfolio returns per market
period given on the right vertical axis (in %).
6.4 Simulations and examples 177
Example 6.2: In this example we run the projection algorithm for the estima-
tion of log-optimal portfolio selection functions on real market data from NYSE,
22/4/1998-6/7/1998 (daily closing price data from www.wallstreetcity.com).
We use the same stocks (YELL, JBHT, UNP) as in Example 2.2, Section 2.3.
1 3.000
0.9 3.061
0.8 3.100
0.7 3.116
0.6 3.109
0.5 3.081
0.4 3.030
0.3 2.956
0.2 2.861
0.1 2.742
0 2.600
0 5 10 15 20 25 30 35 40 45 50
0
0 5 10 15 20 25 30 35 40 45 50
1.2
1.15
1.1
1.05
0.95
0.9
0.85
0 5 10 15 20 25 30 35 40 45 50
1.2
1.15
1.1
1.05
0.95
0.9
0.85
0.8
0.75
0 5 10 15 20 25 30 35 40 45 50
Figure 6.3: Value of a $1 investment in two single stocks (grey), and in the es-
timated log-optimal portfolio of the two (black, solid) at NYSE 22/4-6/7/1998.
180 L’Envoi
returns. We just sketched and, in fact, then skipped most of the huge effort that
should have been put into diagnostic testing of these assumptions in Section 2.3.
Much of this effort is superfluous here. Indeed, with the model of Chapter 6
we gained considerable flexibility with respect to the underlying market model,
assuming not much more than stationarity and ergodicity. Considering there
is no such thing as absolute certainty about what the true stochastic regime
in the market looks like, we come to appreciate nonparametric algorithms that
work well under very weak assumptions and hence may be applied in many real
markets.
L’ENVOI
Clearly, we were only able to cover a small subsample of the problems the
investor faces in real markets. In the course of this thesis we derived several al-
gorithms for these selected problems, and the reader might (and hopefully will)
find some of them helpful to decide on practical investment problems. Beyond
this algorithmics, we hope we have conveyed the key message of this thesis: The
insight that nonparametric statistical forecasting and estimation techniques are
a valuable tool in portfolio selection and, in fact, in all mathematical finance.
181
REFERENCES
T. Bielecki and S.R. Pliska (2000): Risk sensitive asset management with
transaction costs, Finance Stochast., 4, 1-33.
P.J. Brockwell and R.A. Davis (1991): Time Series: Theory and Meth-
ods. Springer, New York.
Th. Cover and E. Ordentlich (1996): Universal portfolios with side in-
formation, IEEE Trans. Inform. Theory, 42(2), 348-363.
Th. Cover and J.A. Thomas (1991): Elements of Information Theory, Wi-
ley, New York.
J.C. Cox, S.A. Ross and M. Rubinstein (1979): Option pricing: a sim-
plified approach, Journ. Financial Economics, 7, 229-263.
M.H. Davis and A.R. Norman (1990): Portfolio selection with transac-
tion costs, Math. Oper. Res., 15(4), 676-713.
L.D. Davisson (1965): The prediction error of stationary Gaussian time se-
ries of unknown covariance, IEEE Trans. Inform. Theory, 11(4), 527-532.
Z. Ding, C.W. Granger and R.F. Engle (1993): A long memory prop-
erty of stock market and a new model, J. Empir. Finance, 1, 83-106.
J.L. Doob (1984): Classical Potential Theory and its Probabilistic Counter-
part, Springer, New York.
I.A. Ibragimov and Y.V. Linnik (1971): Independent and Stationary Se-
quences of Random Variables, Wolters-Noordhoff, Groningen.
K. Knopp (1956): Infinite Sequences and Series, Dover Publ., New York.
H.A. Latané (1959): Criteria for choice among risky ventures, Journal of
Political Economy, 38, 145-155.
H.M. Markowitz (1976): Investment for the long run: new evidence for an
old rule, J. Finance, 31(5), 1273-1286.
G. Matheron (1975): Random Sets and Integral Geometry, Wiley, New York.
S. Mittnik and S.T. Rachev (1993): Modeling asset returns with alterna-
tive stable distributions, Econometric Review, 12, 261-330.
I.S. Molchanov (1993): Limit Theorems for Unions of Random Closed Sets,
Lecture Notes in Mathematics 1561, Springer, Berlin.
P.A. Samuelson (1967): General proof that diversification pays, J. Fin. and
Quant. Anal., 2, 1-13.
S. Serra Capizzano (2000): How bad can positive definite Toeplitz matri-
ces be? Numer. Funct. Anal. and Optimiz., 21(1-2), 255-261.
W.F. Stout (1974): Almost Sure Convergence, Academic Press, New York.