(Book) Handbook of Statistics Vol 19 - Elsevier Science 2001 - Stochastic Processes. Theory and Methods North Holland - Shanbhag - Rao

Preface
J. Neyman, one of the pioneers in laying the foundations of modern statistical theory, stressed the importance of stochastic processes in a paper written in 1960 in the following terms: "Currently in the period of dynamic indeterminism in science, there is hardly a serious piece of research, if treated realistically, does not involve operations on stochastic processes". Arising from the need to solve practical problems, several major advances have taken place in the theory of stochastic processes and their applications. Books by Doob (1953; J. Wiley and Sons), Feller (1957, 1966; J. Wiley and Sons) and Lo~ve (1960; D. van Nostrand and Co., Inc.), among others, have created growing awareness and interest in the use of stochastic processes in scientific and technological studies. Journals such as Journal of Applied Probability, Advances in Applied Probability, Zeitschrift ffir Wahrscheinlichkeitsthorie (now called Probability Theory and Related Topics), Annals of Probability, Annals of Applied Probability, Theory of Probability and its Applications and Stochastic Processes and their Applications, have all contributed to its phenomenal growth. The literature on stochastic processes is very extensive and is distributed in several books and journals. There is a need to review the different lines of researches and developments in stochastic processes and present a consolidated and comprehensive account for the benefit of students, research workers, teachers and consultants. With this in view, North Holland has decided to bring out two volumes in the series, Handbook of Statistics with the titles: Stochastic Processes." Theory and Methods and Stochastic Processes: Modeling and Simulation. The first volume is going to press and the second volume, which is under preparation will be published soon. The present volume comprises, among others, chapters on the following topics: Point Processes (R. K. Milne), Renewal Theory (D. R. Grey), Markov Chains with Applications (R. L. Tweedie), Diffusion Processes (S. R. S. Varadhan), Martingales and Applications (M. M. Rao), Ito's Stochastic Calculus and its Applications (S. Watanabe), Continuous-time ARMA Processes (P. J. Brockwell), Random Walk and Fluctuation Theory (N. H. Bingham), Poisson Approximation (A. D. Barbour), Branching Processes (K. B. Athreya and A. N. Vidyashankar), Gaussian Processes (W. Li and Qi-Min Shao), L6vy Processes (J. Bertoin), Pareto Processes (B. C. Arnold), Stochastic Processes in Reliability (M. Kijima et al.), Stochastic Processes in Insurance and Finance (P. A. L. Embrechts et al.), Stochastic Networks (H. Daduna), Record Sequences
vi
Preface
with Applications (J. A. Bunge and C. M. Goldie), Associated Sequences and Related Inference Problems (B. L. S. Prakasa Rao and I. Dewan). Additionally, the volume includes contributions that address some specific research problems; among these are A. Klopotowski and M. G. Nadkarni, Y. Kakihara, A. Bobrowski et al., and R. N. Bhattacharya and C. Waymire. There are also further chapters that deal with some general or specific themes: the chapter of I. V. Basawa is on Inference in Stochastic Processes and those of B. L. S. Prakasa Rao and of C. R. Rao and D. N. Shanbhag concentrate on covering characterization and identifiability of some Stochastic Processes and related Probability Distributions. An effort is made in this volume to cover as many branches of stochastic processes as possible. Also to get the balance right, we have retained some chapters with applied flavour in this volume. In the planned second volume, we keep the option of including one or two chapters of theoretical nature, assuming that they provide with avenues for future research to specialists in applied areas. We are most grateful to all the contributors and the referees for making this project successful. Some of the contributors have also reviewed other researchers' work. In particular, we are indebted to D. R. Grey, M. Manoharan and J. Ferreira for accepting the task of reviewing several chapters. Also, we would like to thank the publishing editors of Elsevier, Drs G. Wanrooy and N. van Dijk for their patience and encouragement. Finally, we would like to thank Department of Statistics, The Pennsylvania State University, USA, and Department of Probability and Statistics, The University of Sheffield, UK, for providing us with facilities to edit this volume. This project is supported by the US Army Research Grant DAA H 04-96-1-0082. D. N. Shanbhag C. R. Rao
Contributors
O. Arino, Department of Mathematics, University of Pau, 64000 Pau, France IRD, LIA/GEODES, 32, Avenue Henri-Varagnat, 93243 Bondy, France, e-mail: Ovide.Arino@bondy.irdfr (Ch. 8) B. C. Arnold, Department of Statistics, University of California, Riverside, CA 92521, USA, e-mail: barry.arnold@ucr.edu (Ch. 1) K. B. Athreya, Departments of Mathematics and Statistics, Iowa State University, Arnes, IA 50011, USA, e-mail: athreya@orion.math.iastate.edu (Ch. 2) A. D. Barbour, Department of Applied Mathematics, University of Zurich, Winterthurerstrasse 190, 8057, Zurich, Switzerland, e-mail: adb@amath.unizh.ch (Ch. 4) I. V. Basawa, Department of Statistics, University of Georgia, Athens, Georgia 30602-1952, USA, e-mail: Ishwar@stat.uga.edu (Ch. 3) J. Bertoin, Laboratoire de Probabilites, Universit6 Pierre et Marie Curie, 175 rue du Chevaleret, F-75013 Paris, France, e-mail: jbe@ccr.jussieu.fr (Ch. 5) N. H. Bingham, Department of Mathematical Sciences, Brunel University, Uxbridge, Middlesex UB8 3PH, UK, e-mail: n.bingham@stat.bbk.ac.uk (Ch. 7) R. Bhattacharya, Department of Mathematics, Indiana University, Bloomington, IN 47405, USA, e-mail: bhattach@ucs.indiana.edu (Ch. 6) A. Bobrowski, Department of Mathematics, University of Houston, 4800 Calhoun Road, Houston, TX 77204-3476, USA On leave from Department of Mathematics, Technical University of Lublin, ul. Nadbystrzycka 38A, 20-618 Lublin, Poland, e-mail: adambob@math.uh.edu (Ch. 8) P. J. Brockwell, Statistics Department, Colorado State University, Fort Collins, CO 80523-1877, USA, e-mail: pjbrock@rmit.edu.au (Ch. 9) J. Bunge, Department of Statistical Science & Department of Social Statistics, Cornell University, Ithaca, New York 14853-3901, USA, e-mail." jab18@cornell.edu (Ch. 10) R. Chakraborty, Human Genetics Center, School of Public Health, University of Texas, Health Science Center, P.O. Box 20334, Houston, TX 77225, USA, e-mail: rc@hgc9.sph.uth.tmc.edu (Ch. 8) H. Daduna, Institute of Mathematical Stockastics, Department of Mathematics, University of Hamburg, Bundesstrasse 55, D-20146, Hamburg, Germany, e-mail: daduna@math.uni-hamburg.de (Ch. 11)
XV
xvi
Contributors
I. Dewan, Indian Statistical Institute, 7, S.J.S. Sansanwal Marg, New Delhi 110 016, India, e-mail: isha@isid.ac.uk (Ch. 20) P. Embrechts, Department of Mathematics, ETHZ, CH-8092 Zurich, Switzerland, e-mail: ebrechts@math.ethz.ch (Ch. 12) R. Frey, Swiss Banking Institute, University of Zurich, Plattenstr. 14, CH-8032 Zurich, Switzerland, e-maib frey@math.ethz.ch (Ch. 12) H. Furrer, Swiss Re Life & Health, Mythenquai 50/60 CH-8002 Zurich, Switzerland, e-mail: hansjoerg_furrer@swissre.com (Ch. 12) C. M. Goldie, School of Mathematical Sciences, University of Sussex, Brighton BN1 9QH, England, e-mail: C.M.Goldie@sussex.ac.uk (Ch. 10) D. R. Grey, Department of Probability and Statistics, The University of Sheffield, Sheffield, $3 7RH, UK, e-mail: d.grey@sheffield.ac.uk (Ch. 13) Y. Kakihara, Department of Mathematics, University of California, Riverside, CA 92521-0135, USA, e-mail: kakihara@math.ucr.edu (Ch. 14) M. Kijima, Faculty of Economics, Tokyo Metropolitan University, 1-1 MinamiOhsawa, Hachiohji, Tokyo 192-0397, Japan, e-mail: kijima@bcomp.metro-u.ac.jp (Ch. 15) M. Kimmel, Department of Statistics, Rice University, P.O. Box 1892, Houston, TX 77251, USA, e-mail: kimmel@rice.edu (Ch. 8) A. Ktopotowski, Institut Galilde, Universit~ Paris XIII, 93430 Villetaneuse cedex, France, e-mail.'klopot@math.univ-parisl3.fr (Ch. 16) H. Li, Department of Pure and Applied Mathematics, Washington State University, Pullman, Washington 99164-3113, USA, e-mail: Iih@haijun.math.wsu.edu (Ch. 15) W. V. Li, Department of Mathematical Sciences, University of Delaware, Newark, DE 19716, USA, e-mail: wli@math.udeI.edu (Ch. 17) R. K. Milne, Department of Mathematics and Statistics, The University of Western Australia, Nedlands 6907, Australia, e-maiL" milne@maths.uwa.edu.au (Ch. 18) M. G. Nadkarni, Department of Mathematics, University of Mumbai, Kalina, Mumbai, 400098 India, e-mail:nadkarni@mathbu.crnet.in (Ch. 16) B. L. S, P. Rao, Indian Statistical Institute, 7, S.J.S. Sansanwal Marg, New Delhi, 110016, India, e-mail: blsp@isid.ernet.in (Chs. 19, 20) C. R. Rao, Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA, e-maiL" crrl@psuvm.psu.edu (Ch. 21) M. M. Rao, Department of Mathematics', University of California, Riverside, California 92521-0135, USA, e-mail: rao@math.ucr.edu (Ch. 22) M. Shaked, Department of Mathematics, University of Arizona, Tucson, AZ 85721, USA, e-mail." Shaked@math.arizona.edu (Ch. 15) D. N. Shanbhag, Probability and Statistics Section, School of Mathematics and Statistics, University of Sheffield, Sheffield, $3 7RH, UK, e-maiL" d.shanbhag@Sheffield.ac.uk (Ch. 21) Q.-M. Shao, Department of Mathematics, University of Oregon, Eugene, OR 97403, USA, e-maiL" Shao@math.uoregon.edu (Ch. 17)
Contributors
xvii
R. L. Tweedie, Division of Biostatistics School of Public Health, University of Minnesota, Minneapolis, MN 55455-0378, USA, e-mail: tweedie@ space.state.colostate.edu (Ch. 23) S. R. S. Varadhan, New York University, Courant Institute of Mathematical Sciences, New York, N Y 10012-1185, USA, e-mail: varadhan@cims.nyu.edu (Ch. 24) A. N. Vidyashankar, Department of Statistics, University of Georgia, Athens, GA 30612, USA, e-mail: anand@stat.uga.edu (Ch. 2) S. Watanabe, Department of Mathematics, Graduate School of Science, Kyoto University, Kyoto 606-8502 Japan, e-mail: watanabe@kusm.kyoto-u.ac.jp (Ch. 25) E. C. Waymire, Department of Mathematics, Oregon State University, Corvallis, OR 97331, USA, e-mail: waymire@math.orst.edu (Ch. 6)
D. N. Shanbhag and C. R. Rao, eds., Handbook of Statistics, Vol. 19 2001 Elsevier Science B.V. All rights reserved.
|
1_
Pareto Processes
B a r r y C. A r n o l d
I. Introduction
As Vito Pareto (1897) observed, m a n y economic variables have heavy tailed distributions not well modelled by the n o r m a l curve. Instead, he p r o p o s e d a model subsequently christened, in his honor, the Pareto distribution. The defining feature o f this distribution is that its survival function P(X > x) decreases at the rate of a negative power o f x as x -+ oc. Thus we have
P ( X > x) ~ cx ~[x ---+ oc] .
(1.1)
A spectrum o f generalizations of Pareto's distribution have been p r o p o s e d for modelling economic variables. A convenient survey m a y be f o u n d in A r n o l d (1983). The classical Pareto distribution has a survival function of the f o r m
{'x(x) = (x/a) -~,
x >a
(1.2)
where a > 0 is a scale parameter and e > 0 is an inequality parameter. If X has distribution (1.2) we will write X ~ ~(I)(o-, e). A m i n o r modification o f (1.2), obtained by introducing a location parameter, is as follows
Fx(x) =
1 4-
x > #
(1.3)
Here # is a location parameter, a a scale parameter and e an inequality parameter. I f X has distribution (1.3) we write X ~ ~ ( / / ) ( # , o5 e). A third variant of Pareto's distribution has as its survival function
Fx(x) =
1+
c~
x > #
(1.4)
where # is a location parameter, a is a scale parameter and e is an inequality parameter. I f X has distribution (1.4), we write Y ~ N(III)(#, o, ~). Clearly all three o f the Pareto distributions (1.2)-(1.4) exhibit the tail behavior (1.1) postulated by Pareto. In practice, it is difficult to discriminate between models (1.3) and (1.4) and the choice m a y justifiably be m a d e on the basis o f which
B . C . Arnold
model is mathematically more tractable. In Section 2 we will review distributional properties of these Pareto models. However, in economic applications one rarely encounters random samples from specific distributions. More commonly one encounters realizations of stochastic processes. An argument (modelled after Pareto's arguments) can be made to justify an assumption that the observed processes will have Pareto marginal distributions. The present chapter will have as its chief focus, a survey of stationary stochastic processes with Pareto marginal distributions. They will provide reasonable alternatives to the normal and/or log-normal processes which are frequently used to model economic time series. One-dimensional processes will generally be described but natural extensions to multivariate settings will be pointed out along the way. 2. Distributional properties of Pareto variables A convenient reference for distributional properties of the Pareto and various generalized Pareto distributions is Arnold (1983). In this section we will document only those properties that play salient roles in the development of the stochastic processes to be described in subsequent sections. The classical Pareto distribution is intimately related to the (shifted or translated) exponential distribution. A random variable Y has a translated exponential distribution, written Y ~ T exp(#, ~) if it admits the representation
Y = # + aZ
where # E R, a E R + and Z has a standard exponential distribution; i.e.
(2.1)
P ( Z > z ) = e z,
z>0
(2.2)
I f X has a classical Pareto distribution, specifically X ~ ~(I)(a, e), then if we define Y = log X it is readily verified that Y ~ Texp(log o-, l/e). In particular the logarithm of a classical Pareto variable with unit scale parameter, i.e. N(1, e), will have an exponential distribution with scale parameter 1/e or, equivalently, intensity e. If XI,X2,... are i.i.d, exponential random variables with common intensity 2 and if N, independent of the Xis, has a geometric (p) distribution, i.e., P ( N = n ) = - p ( 1 - p ) " - m , n = l , 2 , . . . , then Y = ~ = I X , - will again have an exponential distribution with intensity p2. Thus pY d=X1 (where here and henceforth, ~ indicates equality in distribution). Perhaps the simplest verification of the fact that pY d=X1 involves evaluation of the moment generating function of pY by conditioning on N. If we recall the simple relationship between the classical Pareto distribution and the exponential distribution, we may immediately write down a distribution result involving geometric products of Pareto variables. Specifically, if X1,X2,... are i.i.d. ~(I)(1, e) random variables and if N, independent of the Xs, has a geometric (p) distribution, then if we define
N
Y = l-Ixi
i=1
Pareto processes
we find that
Yp xl
(2.3)
Since minima of independent exponential random variables are exponentially distributed, a similar property holds for classical Pareto variables. Specifically, if X1,... ,Xn are independent random variables with X~ ~ N(I)(a, o~i), i = 1 , 2 , . . . , k then
i=l,2,...,n
i=1
As we shall see, properties (2.3) and (2.4) will be useful in constructing a variety of stochastic processes with classical Pareto marginals. The Pareto 0II) family of distributions is closed under geometric minimization. Specifically, if X1,X2,... are independent, identically distributed ~(III)(#, ~, c~) variables and if N, independent of the X,.s has a geometric (p) distribution, we define Y = min Xi .
i<N
(2.5)
By a simple conditioning argument we find
Y ~ ~(III)(#, ap 1/~, o~)

and consequently, if # = O,
p-1/~y ~=Xl
(2.6)
In fact, a parallel statement may be made with regard to geometric maxima. If we define Z = max Xi
i<N
(2.7)
then
Z ~ ~ ( 1 1 1 ) ( # , cTp-1/~, o~)
and, if # = 0,
pl/ z L x l .
(2.8)
Properties (2.6) and (2.8) essentially characterize the ~(III)(0, a, c~) distribution (see Arnold et al., 1986). Repeated geometric minimization has been suggested as a model for income distributions (see Arnold and Laguna, 1976; Arnold, 1983). If Y and the Xs in (2.5) are of the same type, their common distribution is said to be min-geometric-stable (the parallel concept of max-geometric stability is discussed in Rachev and Resnick, 1991). If the distribution of X] can be recovered
B . C . Arnold
from that of Y = mini<N Xi by a change of scale only, then the corresponding raingeometric stable distribution is almost forced to be Pareto (III) (0, a, ~). Indeed, if this works for all p or for two incommensurate choices o f p (i.e., pl,p2 such that {p~/p~:j = 1 , 2 , . . . , k = 1,2,...} is dense in (0, 1)) then (cf. Arnold and Laguna, 1976) the distribution of X1 must be Pareto (III) (0, al, c~). If (2.6) holds for just one particular choice o f p then it is not difficult to verify that F, the common survival function of the X-s, must satisfy F(x) = [1 + ~0(x)]-' where qo is non-decreasing, right continuous and satisfies
(p(X) = 1 go(pl/C~X ) .
(2.9)
(2.10)
Distributions satisfying (2.9) are called semi-Pareto distributions (Pillai, 1991). Of course the simplest example of a function satisfying (2.10) is the choice cp(x) = (x/a) ~. This of course corresponds to the Pareto (III) (0, a, ~) distribution. However, a wide variety of other functions will satisfy (2.10) since it only requires that (p be a suitably periodic function of log x.
3. Multivariate Pareto distributions
What should be the natural multivariate version of our various Pareto distributions? It is natural to require Pareto marginals. It is perhaps asking too much, if we insist on Pareto conditionals as well. However, there is one kind of Multivariate Pareto distribution, that introduced by Mardia (1962), which has both marginals and conditionals of Pareto form. To make this statement strictly true we need to consider a slight modification of Mardia's distribution in which we will have marginals and conditionals in the Pareto (II) family (Eq. (1.3)). We will say that a k-dimensional random vector X has a multivariate Pareto distribution of Type II and write X ~ MP(k)(II)(ff_,a_, ~) if its joint survival function is of the form:
Fx( ) = P(x__ >
1+
k
.=
xi -- #i
xi>#i,
i=l,2,...,k
(3.1)
For any kl < k, all kl dimensional marginals are again of the (kl-dimensional) form (3.1) (merely set "unwanted" xis equal to their corresponding value #i in (3.1)). To indicate partitioning of the k-dimensional vector, X, into kl and k - kl dimensional subvectors, we use the notation X = (2,2'). Analogously we partition # = (/2,)i) and a = (6-,"_d).Then we have
2 ~ M* (3.2)
Pareto processes
and of course, univariate marginals are of the Pareto (II) form displayed in (1.3). Conditional distributions are also in the same family. One finds
[ ~ ~_ y~ e'..a Me(kl)(II)
(fi_,c (2_)~_,c~4. k - k l)
(3.3)
where
E k i=kll \ X Gi /I
A convenient stochastic representation of the multivariate Pareto (II) distribution is available. One may construct an MP (k) (II) random vector X by defining
Xi = p, + ai(W~/Z),
i= l,2,...,k
(3.4)
where the W/s are independent standard exponential variables and where Z (independent of the ms) is a F(~, 1) random variable. A minor modification of the representation (3.4) yields a multivariate Pareto (III) distribution. We will have X ~ MP(k) (III) (p, a_, ~_) if
Xi = #i 4-
cri(WilZ) 1/cq,
i = 1,2,... ,k
(3.5)
where W1, W2,..., Wk and Z are independent standard exponential variables. The corresponding joint survival function is of the form
=
i=1 \
- O-i J
Xi>#i , i= 1,2,...,k
(3.6)
This will have kl-dimensional marginals of the MP (kl)(III) form. The conditionals are, it turns out, of a related Pareto (IV) form (see Arnold, 1983, Chapter 6 for details). Properties of the classical Pareto and the Pareto (III) distribution in conjunction with the information provided by the representations (3.4) and (3.5) can be used to generate a wide variety of alternative multivariate Pareto distributions. For example, we know from (3.5) that certain scale mixtures of Weibulls yield Pareto (III) variables. In (3.5) the W/s were taken to be independent standard exponential variables. Instead, the joint distribution of W could be taken to be any one of the wide variety of multivariate exponential distributions (with standard exponential marginals) available in the literature. Arnold (1975) and Block (1975) catalog a variety of such distributions. See also Hutchinson and Lai (1990). An alternative route, again relying on the available plethora of multivariate exponential distributions, involves marginal transformations. Begin with W having standard exponential marginals. Then define
Xi=#i+ai(e ~/~-1),
i=l,2,...,k
(3.7)
to get a multivariate Pareto (II) vector X (set #i = o-i to get a k-variate Pareto (I) distribution). Alternatively define
B . C . Arnold
Y i = # i + a i ( e N - l ) 1/~,
i= 1,2,...,k
(3.8)
to get a multivariate Pareto (III) distribution. For example, a suitable choice of multivariate exponential distribution for W (namely, one for which f'w(_w) = ( ~ = 1 ew~ - k + 1) -1, w > 0_) yields, using (3.7), the MP(k)(II)(~,a,c~) displayed in (3.1). The fact that geometric sums of independent exponential variables are again exponential can and has been used extensively to generate multivariate exponential distributions. One can use dependent or independent geometric random variables in conjunction with vector random variables with dependent or independent exponential marginals. A parallel spectrum of possible constructions involve geometric minima. Suppose X I , X 2 , . . . are independent, identically distributed, random vectors and suppose that N, independent of the Xs, has a geometric (p) distribution. Using a coordinatewise definition of the minimum of random vectors, we may define Y = min X i .
- i < N - -
(3.9)
Now suppose that for some vector _c(p) we have
c(p)_V ~ x_l .
(3.10)
where the multiplication of vectors is assumed to be done coordinatewise. We, following Rachev and Resnick (1991), will say that the common distribution of the Xis is min-geometric stable (max-geometric-stability is obviously defined in a parallel fashion). In one dimension, for (3.10) to hold for every p, it was necessary and sufficient that the Xs have Pareto (III) distributions. In k dimensions a rich class of solutions to (3.10) is available. We know that the marginals must be of the Pareto (III) form. However the joint distribution can have a variety of forms. Using results in Resnick (1987, Chapter 5) and in Rachev and Resnick (1991), relating min-geometric stable distributions to min-stable distributions which are then related to multivariate extreme distributions, we arrive at the following characterization of joint distributions for the Xs such that (3.10) holds for every p E (0, 1). There must exist non-negative integrable functions J}(s), i = 1 , 2 , . . . , k on [0, 1] satisfying
01j~(s)ds = 1, i=- 1,2,... ,k
and f'x(x_) = 1+ max

(s) x i - # i
\ ai
l_<i_<k
ds
(3.11)
An example of a k-dimensional Pareto (III) distribution obtainable using the representation (3.11) is
Pareto processes
fix(X_)= 1+
(a simple transform of a family of k-dimensional logistic distributions discussed by Strauss, 1979). Semi-Pareto variants of (3.11) are available (they will generally satisfy (3.10) for just one fixed value of p). They are of the form
(xi-#i)
i=1\ ai /
/ ]
x_>#
(3.t2)
F'x(x_) = 1 +
[ f01
l<_i<_k
max[fi(s)gi(x)]ds,J
(3.13)
where the J}s are as described prior to (3.11) and the 9is are non-decreasing, right continuous functions satisfying Oi(xi) = },9(pl/~'xi). Of course, if the Xis in (3.9) have any common joint distribution with Pareto (III) marginals (not just those of the form (3.11)), it will still be true that a geometric minimum of the form (3.9) will again have Pareto (III) marginals. It will not in general be possible to rescale Z to have the same distribution as the X s (unless (3.11) holds). For example, if we begin with Fx__(x_)=H 1 +
i=1 \ 0-i /t J
x_>#
--
(3.14)
we find that the geometric minimum (_Y defined by (3.9)) has joint survival function of the form
1;y(y)
--
1+
P
i=1
1 + (xi
\ O'i
#i~ ~'
//
_>
(3.15)
Equation (3.15) provides another example of a min-geometric stable Pareto (III) distribution (because a geometric sum of i.i.d, geometric random variables is again geometric). This will be true for any Y obtained via (3.9) beginning with any choice of distribution of the Xjs with Pareto (III) marginals. Thus, repeated geometric minimization will not lead to a broader class than (3.11). In fact, we have an alternative description of the class of all k-dimensional min-geometric stable Pareto (III) distributions. They are of the form (3.9) for somep E (0, 1) and some arbitrary choice of Fx(x_) with Pareto (III) marginals. A broader class of k-dimensional Pareto (III) variables can be encountered if we define _Y by Y,- = min X/j
j_<N
(3.16)
where N is a random vector with geometric (Pi) marginals and (Xu,X2j,... ,Xkj) are i.i.d, k-dimensional random vectors with Pareto (III) marginals. A hierarchy of multivariate Pareto (III) distributions can be constructed by repeated application of such geometric minimization paradigms (a close parallel to the families
B.C. Arnold
of multivariate exponential distributions introduced in Arnold (1975)). See Arnold (1990) for an example of a highly flexible family of multivariate Pareto (III) distributions derived using a geometric minimization model with dependent Nis. It will be recalled that in one dimension, geometric-product stability was a salient property of the Pareto (I) (o-,c~) distribution. It is natural to seek multivariate parallels of this. A k-dimensional random vector X has a distribution that is geometric-multiplication stable if whenever we take X1,X2,... i.i.d, with the same distribution as X and N independent of the X~s being a geometric (p) random variable we have a(p)[I-[i<_NXi]b(p)dXl for every p E (0, 1). Here the multiplication and exponentiation of vectors is assumed to be done coordinatewise. Such a distribution will necessarily have classical Pareto marginals. It is possible to describe the class of all such multivariate geometric multiplication stable distributions. The details are provided in Mittnick and Rachev (1991). We, at present, are only interested in multivariate multiplication stable distributions with classical Pareto marginals. Thus we wish to have Xj ~ ~ ( I ) ( 1 , c @ j = 1 , 2 , . . . , k (scale transformations, i.e., the ~ris, can be introduced later if desired). Since our Xjs are bounded below by 1, their logarithms are non-negative and indeed have exponential distributions. If we let Y = log X (taking logarithms coordinatewise) then I7 will be, in the Mittnick-Rachev terminology, a multivariate geometric summation stable random variable with non-negative coordinates, in fact, with exponential (ej) marginals. We can describe the distribution of X in terms of the transform (recall Xj > 1 for every j) ~0x_(s_) = E (j_I~IXJXJ), s < 0 . (3.17)
Referring to Mittnick's and Rachev's representation theorem for multivariate geometric summation stable distributions (using Laplace transforms instead of characteristic functions since our random vector Z has non-negative coordinates) we may write qgx(S ) = [1 - logq~(s_)]
1
(3.18)
where ~b(s) is the Laplace transform of a non-negative stable random vector with degenerate marginal distributions, i.e., of some stable non-negative random vector Z_ with Zj = c9 w.p. 1 for each j. This last curious constraint is required to ensure that the marginals of X are of the Pareto (I) form as desired. As we have seen, some of the multivariate Pareto distributions introduced above have, in addition to Pareto marginal distributions, Pareto conditional distributions. It is of interest to explore the class of distributions with Pareto conditionals since they may provide plausible alternatives to the usual multivariate Pareto distributions. For notational convenience we will focus our discussion on the bivariate case. A helpful reference on distributions with specified conditionals is Arnold et al. (1992).
Pareto processes
Suppose that we wish to identify all bivariate distributions for a random vector (X, Y) such that all of its conditionals are of the Pareto (II) form, i.e., such that for each y > 0
X]Y = y ~
Pareto (II)(#1(y), o-1~), 0~ 1(y))
(3.19)
and for each x > 0,

YIX = x ~
Pareto (II)(/~2(x), o-2(x), c~2(x)) .
(3.20)
We saw earlier that the modified Mardia bivariate Pareto (II) distribution, (3.1), had Pareto (II) conditionals (see (3.3)). In addition, it has Pareto (II) marginals. If we wish to identify all possible distributions with Pareto (II) conditionals as in (3.19)-(3.20), we need to consider the following equation obtained by writing the joint density of (X, Y) as a product of a marginal and a conditional density in two possible ways [fy(y)/al(y)]I(x > #I(Y)) = ~ f x ( x ) / a 2 ( x ) ] I ( y >/~2(x)) 1(y)]~1(y)+l [1 + y-uz(x)]~2(x)+1 + x_/~ 1 ~,(y) j ~-77~7-J (3.21)
In its most general form, (3.21) is difficult to solve. It can be readily solved in the special case in which we assume #1 (Y) =/~1,/*2(x) = #2, cq (x) = c~2(y) = ~. In this case, following arguments presented in Arnold (1987), we find that the joint density of (X, Y) must be of the form
fx,y(x,y)
= c(2)[20 + 21(x - I.fi) + 2 2 ( y - #2) + 212(x - &)(y - #2)]-(~+I)I(x > #l)I(y > #2) (3.22)
for suitable choices of the parameters 20, 21,22,212. c(_2) is a normalizing constant chosen to ensure that the density integrates to 1. The location parameters #l and/z 2 can assume any real value. The 2s must all be non-negative and, to ensure integrability, we must distinguish two cases: (i) ~ E [0, 1]. In this case we require 21 > 0, ")~2> 0,212 > 0 and 20 _> 0. (ii) ~ E (1, ec). In this case we must have 2o > 0,21 > 0,22 > 0 and 212 _> 0. If the joint density is of the form (3.22), the corresponding conditional distributions are of the following simple form
XIY --y ~
( 20 +22(y--/-~2) ) Pareto (II) #1,~ ~_ 21~C~-2),c~
(3.23)
and
Y]X = x ~
Pareto (II) ~ 2 , ~ _ ~ z ~ -
20 Jr- 21(X -- /~1) ) ~-l)' c~ .
(3.24)
We turn next to consider joint densities for (X, Y) which have Pareto (III) conditionals, i.e., such that for every y > 0,
10
B.C. Arnold
X IY = y ~ Pareto (III)(//l(y), crl (y), ~1 (Y)) and for each x > O,

Y I x = x ~ P a r e t o ( I I I ) ( / / 2 ( x ) , a2(x), c~2(x)) .
(3.25)
(3.26)
Writing the joint density as a product of a marginal and a conditional density in the two possible ways yields the following equation. f r (Y) ~Z~ 5 > (y))
[
= Jxv~j ~
Fx-&(Y)lcq(Y)] 2
L ~ J ~2(x)-llvY < >//2(x))
(3.27)
Equation (3.27) like (3.21) is difficult to solve in general. A special case, in which solution is straightforward, occurs when //L(Y)=//1,//2(x) =//2 and ~l(X)= el, e2(y) = c~2.In such a case, it is evident t h a t X = (X - / / i ) ~ and I? = (Y -//2) ~= will have Pareto (II) conditionals (with / / = 0 and ~ = 1). It follows from our earlier observations that
f2s(2,fi) = c(_2)[1 J[1.~+ 2233+ .~12#33]-2I(~ " > O)I()~ > O)

and then, transforming back to
(3.28)
(X, Y), we
find
fxy(x,y)
----C(.),)
O{I(X-- //l)a' lc~2(y-- //2)~2-1/(X> //1)/(Y )" //2)

[1 -- J~l (X -- //i) cq @ ~2(Y -- /'/2) c~2/~12( x -- //1)cq (y -- //2)c~2]2 (3.29)
where 21 > 0, 22 > 0,212 > 0, ~1 > 0, ~2 > 0 and//~ E R, #2 C R. If the joint density of (X, Y) is of the form (3.29) then the corresponding conditional distributions are of a relatively simple form
X I Y : y ~ ~(111)
//1, ~ 7 - - -
[ 1+/~2(Y_--__//2)~ ll/al "~ _}_~ul2(Y__ ]12)~'_j 9~I)
(3.30)
and
YlX=x~(HZ)
//2, ~2_}_)q2(x_//1)~lj
,~2)
(3.31)
In Section 8, we will describe certain Markov processes with transition probabilities governed by the conditional distributions in (3.23) and (3.30). In
Pareto processes
11
order for these chains to be stationary we require exchangeable versions of the joint densities (3.22) and (3.29) (i.e., 21 = 22,/~1 = #2 and, in (3.29), ~1 = c~2). Note that the models (3.22) and (3.29), which have Pareto conditionals, in general, do not have Pareto marginals. The exceptional case occurs in model (3.22) if )L12= 0 (and necessarily ~ > 1) in which situation, the model reduces to Mardia's bivariate Pareto distribution (cf. (3.1), note that the e in (3.1) is one less than the c~ in (3.22)). Multivariate versions of the Pareto conditionals densities (3.22) and (3.29) are readily identified. They will have conditional densities of X/ given X ( i ) = x_(0 (where X(i ) is X with the ith coordinate deleted) of, respectively, Pareto (II) or Pareto (III) form. They are (a) Multivariate Pareto (II) conditionals distribution.
fx(x) =
~_ I-[(xi - #i) "i

i=1
I(x_ > ~)
(3.32)
where ~k is the set of all vectors 0s and ls of dimension k. All the fi~s are non-negative in (3.32). Some, but not all, can be zero. (b) Multivariate Pareto (III) conditionals distribution.
fX(X__) = I-[~-l [O~i(Xi -- fli)~i-llI(x- ) [l)

-2
(3.33)
where ~k is as defined following (3.32). The fi, s in (3.33) are non-negative and again, some but not all can be zero.
4. Pareto processes
Any stochastic process whose marginal distributions are of the Pareto and/or generalized Pareto form could legitimately be called a Pareto process. Undoubtedly, the same could be said for any Markov process whose transition distributions are of the Pareto form. It is clearly impossible to survey all such processes. Our attention will be concentrated on processes which can be said to be autoregressive in nature and whose structure mirrors the classical normal autoregressive process, the differences of course being, (i) Pareto distributions will play the role played by normal distributions and (ii) Geometric multiplication or minimization will replace addition in modeling dependence of the value of the process at time n on its values at previous times. The classical normal autoregressive processes have proved to be flexible and useful modeling tools. The Pareto processes to be introduced can be expected to
12
B. C. Arnold
better model time series with heavy tailed marginals. As we shall see, a variety of sample path behaviors will be encountered among the different models and selection of the appropriate model will depend on knowledge of the typical nature of the stochastic evolution of the process being modelled. For example, a series exhibiting frequent flat spots in its trajectory of random duration might be well fitted by a particular type of Pareto process, while a series exhibiting steady growth interrupted by random catastrophic decreases might be better fitted by an alternative model. We will introduce, in Sections 5 and 6, a spectrum of first order autoregressive models in which the value of the process at time n, depends on its value at the immediately preceding time point, n - 1, and on an independent "innovation" variable; as with normal processes. Subsequently, in Section 7, we will discuss variant processes, including absolutely continuous versions, higher order autoregressions, analogs to moving averages and A R M A models and multivariate processes. The discussion in Sections 5 and 6 will be quite detailed. The development of higher order models will be considerably more cursory. It is recognized that, in applications, higher order models (and models with even more bells and whistles) need to be considered; however the basic ideas discussed in Sections 5 and 6, should be adequate to permit the interested researcher to flesh out the needed details on a case by case basis. In Section 7, a brief introduction is provided to related topics such as Semi-Pareto processes, general minification processes, processes with Pareto conditionals and processes involving Markovian minimization.
5. Autoregressive elassiealPareto processes

The two basic processes to be presented are direct transformation of well known exponential processes, those introduced by Lawrence and Lewis (1981) and by Gaver and Lewis (1980). It thus seems appropriate to label the processes with the names of these researchers.
5.1. The Lawrence-Lewis classical Pareto process

Begin with a sequence, the innovation sequence, { n}n=l of independent identically distributed .~(I)(1, c~) random variables (recall (1.2)). We will use these to construct an autoregressive stationary stochastic process with the Pareto (I) (a, ~) distribution as its stationary distribution. For n = 1 , 2 , . . . define Xn = o-e p with probability p = X~_lep with probability 1 - p (5.1)
where p E (0, 1). Such a process will be called a first order Lawrence-Lewis Pareto (I) process or more briefly an LLP(I)(1) process (a first order autoregressive Pareto (I) process). The notation is clearly designed to extend to higher order processes. If we introduce a sequence { Un}~=l of i.i.d. Bernoulli (p) random variables the process can be described by a more compact formulation:
Pareto processes
13
Xn
I~
t.An-1)
~ l - U ~ ~U,,op
~' %
(5.2)
If we take logarithms in this expression, we arrive at the Lawrence and Lewis (1981) N E A R process with exponential marginals. If a process, {Xn}, is defined using (5.1) (equivalently (5.2)), then provided that X0 ~ N(I)(o-, ~), it is readily verified that, for every n, Xn ~ N(I)(o-, c~). It is then, as advertised, a stationary process with the desired common distribution of the X,s. The influence of the parameters ~ and p on the sample path behavior of the process can be appreciated upon viewing the simulated sample paths displayed in Figure 1. Note that if the initial distribution (that of X0) is not of the Pareto (I) (a, c~) form, the process will not be stationary but it will be the case that X, converges in distribution to the Pareto (I) (o-,~) form as n --* oo.
alpha=l ,p=.3
o
alpha=l ,p=.5
alpha=l ,p=.7
alpha=l ,p=.9
2. o. ~ - ~ , . . . ~ ~;o go do 8"0 l b0
Time
alpha=2,p=.3
o "l O1 . . . . . .
20 40
60
80
100
20
40
60
80
100
20
40
60
80
100
Time alpha=2,p=.5
Time aipha=2,p=.7
Time alpha=2,p=.9
o4
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
20
40
60
80
100
Time alpha=4,p=.3
Time alpha=4,p=.5
o
Time alpha=4,p=.7
Time alpha=4,p=.9
/iX /
8. .~. P
o
0 i0 4'0 ;o do loo
Time
ilN
ol
20
40
60
80
100
20
40
60
80
100
Time
Time
20 40 60 80 100 Time
Fig. 1. Simulated sample paths of Lawrence-Lewis classical Pareto processes.
14
B. C. Arnold
Provided that e > 2 (to ensure the existence of second moments) we can compute the autocovariance (cov(Xn_a,X,)) by conditioning on U, in (5.2), recalling that X~-I, e,, and U~ are independent. We find
cov(Xn_i,Xn)
= _
(1 -p)c~2a 2 (~ - 1)2(c~ - 2)(c~ -p)
(5.3)
and consequently the autocorrelation is given by
p(X,_I,X,)-
(1 - p ) ~
0~--p
(5.4)
The negative sign of the autocorrelation may be used as a diagnostic key in deciding whether a particular time series might appropriately be modelled by a Lawrence-Lewis Pareto (I) process. It is possible to write down expressions for autocorrelations corresponding to lags bigger than 1, but they do not have a simple form. The following expression for E(X,X,+k) gives the flavor of the complications encountered in such computations pc~2 [.1--(1--p)k(~_~p)k] (1--p)kc~(O:)k E(X,X,+k) = (c~ - 1 ) ( e - p ) L 1 - (1 -p)(~@p) j + ~ ~
"
(5.5) Fluctuation probabilities (i.e., P(X,_I < X,) are potentially useful diagnostic tools for modelling purposes. For the Lawrence Lewis Pareto (I) process we find
P(Xn 1 < X . ) = P ( X . _ 1 < yl~Un~n)
= P(x21 < =pP(X,-I < ep) + (1 -p)P(e~ > 1) 1 -l+p
(5.6)
Equations (5.4) and (5.6) provide quick consistent method of moments estimates o f p and e based on an observed sample path realization from an LLP(I)(1) process.
5.2. The Gaver-Lewis classical Pareto process

oo Again we begin with an innovation sequence, {gn}n=l of independent identically distributed ~(I)(a, c~) random variables. This time for n = 1,2,... we define
X, = ~pX15pe, with probability p = aPX~5~ with probability 1 - p

(5.7)
where p E (0, 1). Such a process will be called a first order Gaver-Lewis Pareto (I) process (GLP(I)(1)). Higher order extensions are possible. The logarithm of the
Pareto processes
15
process defined by (5.7) is recognizable as the Gaver-Lewis (1980) exponential oo is a process, hence the name of the corresponding Pareto process. If { U, n}n=l sequence of i.i.d. Bernoulli (p) random variables, independent of the ens then we can describe the process in a slightly more succinct form ~xy % y . = o" . l-p 1~.
(5.8)
Such a process has a stationary distribution of the classical Pareto (I) (o-, ) form. Consequently, if X0 ~ N(I)(o-, e), the process is completely stationary. The influence of the parameters e and p on the sample path behavior of the process can be understood by referring to Figure 2, which displays simulated sample paths for various values of the parameters.
alpha=l ,p=.3
1:3
o o. o. o. o~
alpha=l ,p=.5
alpha=l ,p=.7
alpha=1 ,p=.9
o 1
o.
'
~0
4o
6o
80
100
20
40
60
80
100
20
40
60
80 100
20
40
60
80
100
Time alpha=2,p=.3
Time alpha=2,p=.5
oo
o
Time alpha=2,p=.7
Time alpha=2,p=.9
g
o
o ~o
o, ~0
o
o.
i
0 20 40 60 80 100
o
o
o, 0 20 40 60 80 100
o,,
20
40
60
80
100
20
40
60
80 100
o o
o o 0.
o.
llll i lllllrrjl[rt
alpha=4,p=.3 alpha=4,p=.5
g. 8 o o o
Time
Time
Time
o
Time
alpha=4,p=.7
alpha=4,p=.9
. 60
. 80 100 0 20 40 60 80 100
20
40
60
80
100
20
40
60
80
100
20
40
Time
Time
Time
Time
Fig. 2. Simulated sample paths of Gaver-Lewis classical Pareto processes.
16
B. C. ArnoM
Assuming c~ > 2, the autocovariance structure of the process can be investigated. We find
CoV(Xn_l,Xn) =
p ) ~ - 1) 2 (c~+ p (1 - -2)(e
(5.9)
and consequently the lag 1 autocorrelation is given by
D(yn l'Yn ) = (1
-- 2)
+ p -- 2)
(5.10)
Observe that this autocorrelation is positive (in contrast to that of the LawrenceLewis Pareto process which has negative autocorrelation (cf. Eq. (5.4)). It may be remarked that the corresponding exponential processes (the Lawrence-Lewis and Gaver-Lewis processes) are not distinguishable by their autocorrelation structure. The fluctuation probabilities of the Gaver-Lewis Pareto process are also potentially useful for modelling diagnostics. We find, using (5.8), and assuming o- = 1 without loss of generality, P(Xn-1 < X.) = P(Xn-1 < A n _ 1 ~,. }
= pP(XP._I < ~.)
..1 - p _U~
= p2/(1 + p )
(5.11)
Equations (5.10) and (5.11) provide quick consistent method of moments estimates of p and c~ based on an observed sample path realization from the Gaver-Lewis Pareto process. The structure of the Gaver-Lewis process leads to a curious free lunch for estimating p. It is possible and indeed probable that the process will generate runs of values which are such that their logarithms form a geometric progression. When this happens, we can determine p exactly from the sample path realization! A similar situation will be encountered with the Yeh-Arnold-Robertson process to be introduced in the next section. One way to avoid this anomalous behavior is to consider absolutely continuous variant processes (see Section 7).
6. Autoregressive Pareto (III) processes

The role played by multiplication in Section 5 will, in this section, be played by minimization. Again two basic processes will be discussed. The first was introduced in Yeh et al. (1988). The second, in a slightly disguised and in fact time reversed form, was introduced in Arnold (1989). The present formulations are designed to highlight the close parallels between these two Pareto (III) processes and the two classical Pareto processes introduced in the last section.
Pareto processes
17
6.1. The Y e ~ A r n o l d - R o b e r t s o n Pareto ( I l l ) process Begin with an innovation sequence { ,,}n=l of independent identically distributed Pareto (III) (0, a, c~) random variables. For n = 1,2,..., following Yeh et al. (1988), define Xn = p-1/~Xn 1 with probability p = min{p-1/~X,-1, ~n} with probability 1 - p (6.1)
g oo
where p E (0, 1). Such a process will be called a first order Yeh ArnoldRobertson Pareto (III) process (YARP(III)(1)). If we define { n}~=l to be a sequence of i.i.d. Bernoulli (p) random variables (independent of the e,s) then we can describe the process in more succinct form
- f 1"~ 1 )
where 1/0 is interpreted as +ec. By conditioning on U,, it is readily verified that the YARP(III)(1) process has a Pareto (III) (0, o-, c~) stationary distribution and will be a completely stationary process if Xo ~ ~(III)(O,{r,~). Representative simulated sample paths for a variety of values of p and ~ are displayed in Figure 3. The lag one autocovariances of this process involve evaluation of and integration of incomplete beta functions. They are most easily obtained by simulation or by numerical integration. See Yeh et al. (1988) where a brief table of approximate autocorrelations may be found. Fluctuation probabilities are not difficult to evaluate for the YARP(III)(1) process. Referring to (6.1) or (6.2), and noting that p-1/~ > 1, we have P(X,_I < Xn) = 1 - P ( X , ~ > X,)
= 1 (1
-p)P(x._l >
(6.3)
1 +p
2 (since en and X,_I are i.i.d.). This simple expression for P(Xn-i < Xn) can be used to develop a simple consistent estimate of p based on an observed sample path from the process. Estimation of the other parameters of the process (~ and a) can be accomplished via the method of moments. Yeh, Arnold and Robertson (1988) recommend a logarithmic transformation be used to avoid moment assumptions. This stochastic process has some remarkable features. One is that, with positive probability, it can generate runs of values in exact geometric progression. An analogous phenomenon was noted in the study of the Gave>Lewis Pareto process. In the present case, perusal of a long sample path from a YARP(III)(1) process would allow us to know exactly the value o f p -1/~. A minor modification of the process, described below in Section 7, will be free of this defect. It can be observed that runs of values in exact geometric progression (due to a constant inflation factor) might be quite appropriate in certain economic scenarios.
18
alpha=l ,p=.3 .
B. C. Arnold
alpha=l ,p=.5
o o
alpha=l ,p=.7
o
alpha=l ,p=.9
o o. o.
~0
4o 6o Time
80 l oo
L_
20 40 60 80 100 Time alpha=2,p=.5
o. o.
o ~o
o
oiL_ ax
' 2'0 4o 60
o0 lo0
Time
alpha=2,O=. 7
20 40 60 80 100 Time alpha=2,p=.9
alpha=2,p=.3
II
o
o.
o
o.
o.
o. ;
o.
~0 4"o 6"o ~0 i~0

Time
oi
~0 4'0 60 8"0 100

Time alpha=4,p=.5
o Q .
olt_ [
0 20 40 60 80 100 Time alpha=4,p=.7
o
o
20 40 60 80 100 Time alpha=4,p=.9
alpha=4,p=.3
1
o o.
o.
o co.
o I
o~ o4 ,4
o o
o
o 20 40 60 80
I
o
o. 0
o-I
2'0 40 60 Time
8o 100
100
2'0 40
60
8o 100
~o
4o
6"o
8o 100
Time
Time
Time
Fig. 3. Simulated sample paths of Yeh-Arnold Robertson Pareto (III) processes. A s e c o n d u n u s u a l p r o p e r t y o f the process i n v o l v e s its r e m a r k a b l y well b e h a v e d e x t r e m e values. T o see this define T~ = rain X,.
O<i<n
(6.4)
and Mn = m a x Xi .
O%i<n
(6.5)
F i r s t c o n s i d e r the m i n i m u m , Tn. R e f e r r i n g to (6.1) we see t h a t Tn will exceed a level t if a n d o n l y if X0 exceeds t a n d if each o f the o b s e r v e d i n n o v a t i o n s exceed t ( a n i n n o v a t i o n is o b s e r v e d w i t h p r o b a b i l i t y 1 - p). T h u s
Pareto processes
19
Tn ~ min ei
i_<N
(6.6)
where the ~is are i.i.d. Pareto (Ill) (0, a, ~) and N, independent of the e,s is such that N - 1 has a binomial (n, 1 - p) distribution. It follows by conditioning on N, that
& (t) = ,,(:to > t)
= [1 -4- (~)~1-1{[ 1 + p ( t ) ~] / [ 1 + (t)~ 1 }n,
t >_ 0 .
(6.7)
From (6.7), the asymptotic behavior of T~ is readily determined, viz.: n(1 -p)i/~Tn/a ~ Weibull (e). To determine the distribution of M~ = max0_<i_<~X~,it is convenient to introduce a family of level crossing processes {Z~(t)} indexed by t E R + and defined by Zn(t)=l ifX~>t = 0 if& < t . (6.8)
It is remarkable that such two state processes are themselves Markov chains with corresponding transition matrices given by P [1 + (t)~ +(t)~ ,~ 1-1[ ( P _p )(t)~ 1-p I " l+p(~)~ (6.9)
Indeed, the fact that these level crossing processes are Markovian for every p, can be used to characterize the YARP(III)(1) process among stochastic processes of the form (6.1) where the common distribution of the e~s is allowed to initially be arbitrary (see Arnold and Hallett, 1989). The observation that the level crossing processes are Markovian permits a simple derivation of the distribution of M~. Thus
FM.(t) = P(M~ < t)

= p(Zo(t) = 0,z,(t) -[l+(~)~]k~J = 0,... ,z,(t) ' t>O .
= o)
(6.10)
[p_+
)
From this it is readily verified that lim P I - - M n

~ O"
/n-1/~
;q~-+OO
<_ t
= exp{-(1 -p)t-~},
t>0 .
(6.11)
Another distinctive property of this Pareto (III) process is the behavior of geometric minima and maxima. Suppose that N * ~ geometric (p) where P(N* = k) = p(1 -p)~, k = 0, 1,2,... then both T = min Xi
O_<i<N*
(6.12)
20
B. C. Arnold
and M = max Xi O<_i<N* (6.13)
can be shown to have Pareto (III) distributions. This curious feature of the process is discussed in more detail in Yeh et al. (1988). The reader is also referred to that paper for discussion of some alternative estimation strategies.
6.2. The Arnold Pareto (III) process

The time-reversed form of this process was introduced first in Arnold (1989) as a process based on geometric minimization. The version presented here is designed to more closely parallel the autoregressive structure exhibited by the Yeh ArnoldRobertson process. In a sense, the Gaver-Lewis process and the Lawrence-Lewis process could be obtained one from the other by interchanging the roles of en and Xn_l in the definition of Xn. The same kind of relationship will exist between the Yeh-Arnold-Robertson process and the Arnold process. One can be obtained from the other by, basically, interchanging the roles of en and Xn 1 in the definition of Xn. For the Arnold process we begin with an innovation sequence {e~} of i.i.d. Pareto (III) (0, a, e) random variables and define X~ = min{X~_l, (1 - p )
1/~n} with probability p
= (1 - p)-l/~e,, with probability 1 - p .
(6.14)
Introducing a sequence { U,}n=I of i.i.d. Bernoulli (p) random variables we may write X, = min X"-I, (1 _-p)i/~ (6.15)
where, as before, 1/0 = oc. It is not difficult to verify that such an Arnold Pareto (III) process (AP(III)(1)) has a Pareto (III) (0, a, c~) stationary distribution and will be a completely stationary process if 22o ~ ~(III)(O,a,~). Representative simulated sample paths for a variety of values of p and c~ are displayed in Figure 4. The autocorrelation structure of this process cannot be determined analytically. Numerical integration or simulation is required. The fluctuation probabilities however are not difficult to evaluate. It is possible for successive values in the process to be equal. Elementary computations, recalling that X,, 1 and e, are independent, yield
P(Xn-1 <e,,/(1 _ p ) l / ~ ) = _ l + l ~ P l o g ( 1 p p"
_p) .
Pareto processes
21
alpha=l ,p=.3
g.
alpha=l ,p=.5
alpha=l ,p=.7
alpha=l ,p=.9
g. g. g. g.
ga g g
o4
20 40 60 80 100
o
g g g g
20
40
60
80
J
100
20
40
60
80 100
20
40
60
80
100
Time alpha=2,p=.3
Time a]pha=2,p=.5
Time alpha=2,p=,7
Time alpha=2,p=.9
g g g. g. el
g4
J
20 40 ) 20 40
gt g4
g g
~4
g4 o~
60 80 100 0
gt ~1
20 40 60 80 100
o"1 o . 0 . 20 . 40 . . 60 . 80 100
20
40
60
80
100
Time alpha=4,p=.3
, o
Time alpha=4,p=.5
g g
o,
Time alpha=4,p=.7
o~,
Time alpha=4,p=.9
g. g. go4
60 80 100
g. g. g~ gt ~7
) 20 40 60 80 100 0
g, g. g.
o"
I
20 40 60 80 100
o4
Time Time
If
!!
20 40 60
_
80 100
Time
Time
Fig. 4. Simulated sample paths of Arnold Pareto (III) processes.
Consequently, by conditioning on Un, we find
P(X,_I = X n ) = 1 +
-Plog(l P (lpp)
-p) 2
(6.16)
_ - e(Xo_l < x n ) - 1 - p + P and
__
log(1 - p )
(6.17)
P(Xn t > X n ) = - - -ps-- - I - p - l o g ( l - p ) ]
1-p
(6.1s)
22
B. C. Arnold
In fact for any k,

P(Xn-1 : Xn = Y n + i . . . . .
Yn+k-1)
= [l+l;Plog(1-p)l
(6.19)
T h u s f l a t spots in the process can occur. Indeed when X,_I is small, flat spots are quite likely since P(X, l = X, . . . . . X,+k-llX,,-1 = Xn-1) . (6.20)
= [1 + ( 1 - p ) ( ~ ) ~ ] k
In addition, from (6.19) and (6.20), it is evident that values o f p close to 1 are conducive to more frequent occurences of flat spots in the process. The simple fluctuation probabilities permit straightforward consistent estimation of the parameter p. A variety of methods are available for consistent estimation of a and c~, taking advantage of the stationarity of the process. Any of the techniques described, for example, in Arnold (1983) for estimation based on i.i.d. Pareto (III) (0, a, e) samples can be used to consistently estimate the parameters o-, c~ based on the stationary sequence X0, X~, X2,....
7. Extensions and modifications
7.1. Higher order processes The four basic processes introduced in Sections 5 and 6 were labelled first order processes. This indicates that the conditional distribution of X,, given the past, depended only on the value of the process one time unit before, i.e., on X, 1. Paralleling the development of normal autoregressive processes of higher orders, we can consider processes analogous to our Pareto processes in which X, depends on k previous values Xn_ 1,Xn-2,... ,Xn-k. A kth order version of the LawrenceLewis Pareto (I) process would be of the form Xn = o-?n with probability p0 =- Xn 1?n with probability pl
= Xn-k~n with probability Pk

k
(7.1)
where Y~j=oPi = 1 and the innovation sequence, en, is chosen to have a common distribution that is selected to ensure that the process is a stationary Pareto (I) (a,a) process. If we consider Yn = log X~, then (7.1) will correspond to a sta-
Pareto processes
23
tionary Pareto (I) (a, e) process if Yn is a stationary kth order exponential process. A good survey of the theory associated with such processes may be found in Lawrence and Lewis (1985). It turns out that a n assumption that ?n ~ N(I)(1, e/po) is appropriate to ensure that the process is a stationary one with Xn ~ ~ ( I ) ( a , e), n = 0, 1,2,.... More general models of the form Xn = a?n with probability p0 = XC~_l?~ with probability pl
= X~k_k~, with probability Pk
(7.2)
where Cl, c 2 , . . . , ck > 0, would be more flexible. However, for k > 2, it is ditticult to determine the appropriate distribution for the innovations {?~}, or indeed to determine whether there exists a distribution for the ~s which will guarantee that (7.2) describes a Pareto (I) (~r,~) process. How should we define a kth order version of the Gaver-Lewis Pareto (I) process? Note from (5.8), that the first order process can be written in the form
Xn = Xn 1 '?'n
1 pN
(7.3)
where ?, = aPe~". A natural kth order version of such a process would be defined by Xn = j 5n (7.4)
k l PJ < 1 and the common distribution of the ?~s is suitably selected. If where ~ j = we write Yn = log X, where {Xn} satisfies (7.4), then the process {Yn} will have a standard linear autoregressive structure of order k with an exponential stationary distribution for I7,,7.Details for the analysis of such processes may be found in (for example) Brockwell and Davis (1991). Higher order versions of the Yeh et al. process were described in Yeh-Shu (1983). They will be of the form (cf. Eq. (6.2)). Xn = min{clX~_l, 5~} with probability pl = min{c2Xn_2, ~ } with probability p2
= min{ckX~_k, ~ } with probability Pk
(7.5)
k ~ p j = 1 and the innovation sequence {?,} is chosen to ensure a stawhere ~ j = tionary Pareto (III) (0, ~r, ~) distribution for X,. As Yeh-Shu (1983) points out, in many cases when k = 2, it is possible to give an explicit description of the form of the required innovation distribution, but when k > 2 it seems difficult to determine the appropriate distribution for ?~.
24
B. C. Arnold
A kth order version of the Arnold process (cf. (6.14)) would be of the form X, = ~ with probability P0 = min{X~_l, ~} with probability pl
= min{X~_k, ~ } with probability Pk
(7.6)
where ~f~-oPJ = 1 and the innovation sequence {~} is chosen to ensure that Xn ~ ~(Iffi(0, o-,~). Here, too, difficulties are encountered in identifying the needed distribution of ~, when k > 2. Analogous moving average and autoregressive-moving average models can also be defined Yeh-Shu discusses such extensions of the Yeh-Arnold-Robertson process. Davis and Resnick (1989) provide material relevant to the study of minA R M A processes; actually they describe max-ARMA processes of the form X~ = max{q51X~_l,
~b2Yn_2, . . . ,
~n-p,
gn, O l g n - 1 , . . . ,
Oqen-q}
(7.7)
7.2. Absolutely continuous modifications

Both the Gaver-Lewis process and the Yeh-Arnold-Robertson process have a singular joint distribution for Xn and X n _ 1 . It was observed that this feature of the two processes allowed for exact estimation of some of the parameters of the models. Realistically, this seems implausible in practice. In any event, it seems appropriate to investigate modifications of the processes which will avoid this pitfall There are many possible approaches to this problem. Perhaps the simplest one is that suggested at the end of Yeh et al. (1988). They propose that instead of using one fixed value of p (see Eqs. (57) and (6.1)) at each stage of the process, we use a randomly selected value of p. Specifically the absolutely continuous version of the Gaver-Lewis Pareto (I) process begins with { n}n=0, a sequence of i.i.d. ~ ( I ) ( a , e) random variables and {B,},~_I a sequence of i.i.d. random variables, independent of the e~s, whose common distribution function G has support (0, 1). We then take X0 = e0 and for n _> 1, given X~_I = xn_l and Bn = b~, define
bn- 1 Xn = a bn x,_ 1 en with probability b,
b,-1 z G bo Xn 1 with probability 1 - b~
(7.8)
It follows that, given B~ = b~, X~ will have a ~(I)(o-, ~) distribution and so unconditionally X~ ~ ~ ( I ) ( a , ~) and our process is stationary. Provided that G, the common distribution of the B,s, is absolutely continuous then the joint distribution of (X,,_I,Xn) will be absolutely continuous. A convenient one parameter family of absolutely continuous distributions which might be used for G, the common distribution of the Bns above, is the power function distribution family; i.e., distributions of the form
Pareto processes G a ( x ) = x a,
25 (7.9)
0<x<l
where 6 > 0. A completely analogous modification of the Yeh-Arnold-Robertson Pareto (III) process is available (see Yeh et al., 1988; Arnold and Robertson, 1989). For co i.i.d. Pareto (III) (0, 05 e) random variables and {B n}~=l oc it we begin with { e,},=0 an i.i.d, sequence (independent of the e,s) with common distribution function G with support (0, 1). Then set X0 = e0 and for n _> 1, given X,-1 = x,-i and Bn ----b~, define
Xn = b21/~X~_l with probability bn = min{b21/~X~_l, en} with probability 1 - b, .
(7.10)
It follows that, given B~ = b~, X, will have a Pareto (III) (0, a, e) distribution and, consequently, unconditionally Xn ~ ~(III)(0, a, c~). The resulting stationary process will be such that the joint distribution of (X,-1,X,) is absolutely continuous provided that G, the common distribution of the B,s, is absolutely continuous. Arnold and Robertson (1989) discuss a closely related logistic process in some detail. It should be noted that the fluctuation probabilities for the absolutely continuous version of the Gaver-Lewis Pareto (I) process and the Yeh-ArnoldRobertson Pareto (III) process remain relatively simple. We have for the absolutely continuous GLP (I) process
P ( X , 1 < X~) = E(P(X,_I < Xn[Bn))
=E
"
(7.11)
and for the absolutely continuous YARP(III) process

P(Xn-1 < Xn) = E(P(Xn-1 < XnlBn)
= [1 + E ( B n ) ] / 2
(7.12)
Of course, one can make the same modification (replacing p by a random variable Bn) in the Lawrence Lewis Pareto (I) process and the Arnold Pareto (III) process. The corresponding fluctuation probabilities for these modified processes are obtainable by considering Eqs. (5.6) and (6.16) (6.18). We need to treat the symbolp appearing in these expressions as a random variable with distribution G and compute the expected values of the right hand sides of the equations, e.g., for the absolutely continuous version of the Lawrence-Lewis process
P(Xn-1 < Xn) = E ( ~ 1
) "
(7.13)
26
B. C. Arnold
7.3. Multivariate processes

All four of the Pareto processes introduced in Sections 5 and 6 admit simple multivariate extensions. The innovation variables become innovation vectors and will be constrained to have suitable stability properties. The k-dimensional Lawrence-Lewis Pareto process begins with a sequence of innovation vectors {-~},=0 which are independent identically distributed with a common k-variate geometric multiplication stable distribution with ~(I)(1, e j ) , j = 1 , 2 , . . . , k, marginal distributions (refer to (3.18) which gives the general form of a generating function for such variables). Now for n = 1,2,... define, for some p E (0, 1), X~ = o-ep with probability p = X,_~ep with probability 1 - p (7.14)
In this defining equation, multiplication of vectors and raising of a vector to a power is understood to be done coordinatewise. Then, provided we take
X0 z o-_~0 ,
we will have a completely stationary process with a k-variate multiplication stable stationary distribution with ~(I)(ai, c~i) marginals. The marginal process " oo {X , n(J)}~=0 for j = 1 , 2 , . . . , k are of course Lawrence-Lewis processes of the kind introduced in Section 5. In analogous fashion the k-dimensional Gave~Lewis process can be defined in terms of the same k-variate geometric multiplication stable innovation sequence {_en}~0. Define X~ = crpxI,-P_e, with probability p
z v
~pyl-p x~n_ 1 with probability 1 - p
(7.15)
(recall again that operations in (7.15) are performed coordinatewise) and set X--0 = -~-~0. This process has marginal processes of the Gaver-Lewis form. To construct k-dimensional analogs of the Y e h - A r n o l ~ R o b e r t s o n and Arnold processes we need an innovation sequence {-~n) which has for the common distribution of the _~s, one which is k-variate geometric-minimum stable (with ~(III)(0, o-j,~z) marginals). Refer to (3.11) for the general form of such distributions. Now for n = 1 , 2 , . . . define, for some p E (0, 1),
X~ =p-1/~-X~_~ with probability p = min{p-1/~-Xn_l, ~_~} with probability 1 - p

(7.16)
(where all operations are performed coordinatewise thus, for example, with probability p,X~(j) = p 1/~iX~_l (j)). Then, provided we set X 0 = ~ , we will have a k-variate process whose stationary distribution is k-variate geometric minimum stable with Pareto (III) (0, a j, ~i) marginals. Of course the marginal processes are univariate Yeh-Arnold Robertson processes.
Pareto processes
27
The k-variate Arnold process begins with the same innovation sequence {-~,}n~0 as that used in the Y e h - A r n o l d - R o b e r t s o n process, but now we define X~ = min{X~ 1, (1 -p)-l/-~_5} with probability p = (1 - p)-1/~-5_~ with probability 1 - p . (7.17)
If X0 = -e0 then we have a stationary process whose marginal processes are of the Arnold type. In practice, it would be necessary to make some assumptions about the structure of the distribution chosen for _e, in (7.14) and (7.15) or of_~, in (7.16) and (7.17). Some assumption specifying that the distribution of ~ or of ~, is known except for a few parameters would seem to be required since, for example, estimation of the structure functions fl (s), f2 ( s ) , . . . , fk(s) appearing in (3.11) would appear to be infeasible.
8. Related processes There are numerous possible variations beginning with our basic Pareto processes. In this section, attention will be generally focussed on one-dimensional processes, unless the extension to k-dimensions involves no more than notational adjustment via underlining.
8.1. Semi-Pareto processes

In the definitions of the Yeh A r n o l d - R o b e r t s o n and Arnold processes the innovation sequence {en}, was taken to have a Pareto (III) (0, a, ~) distribution. Such a choice of innovation distribution led to a stationary process because the Pareto (III) distribution was geometric minimum stable. However for a fixed choice of p, the process would still be completely stationary if the ens had a semi-Pareto distribution as defined in (2.9). Such semi-Pareto versions of the Y e h - A r n o l d - R o b e r t s o n process were introduced by Pillai (1991). Analogous semi-Pareto version of the Arnold process can be constructed and, moreover, k-dimensional versions of those processes can be readily envisioned. Fix a particular value o f p E (0, 1). Then begin with a k-dimensional innovation sequence {-%}n~0 having a c o m m o n semi-Pareto (p) distribution as described in Eq. (3.13). N o w for a Y e h - A r n o l d - R o b e r t s o n semi-Pareto process define, exactly as in (7.16),
Xn = p-1/~-X, 1 with probability p

= min{p
1/~-Xn_l,~n} with probability
1 -p
(8.1)
Provided we set X 0 = _%, we will have a stationary k-variate semi-Pareto process with marginals given by (3.13).
28
B. C. Arnold
Analogously a k-variate semi-Pareto process of the Arnold type is obtainable using the same innovation sequence as used in (8.1) (i.e., with density (3.13)) and with successive X,s defined as in (7.17). The material in Section 6 dealing with extremes of the YARP(III) process continues to hold true if the distribution of the gis is semi-Pareto instead of Pareto (III). Thus if {X~} is as defined in (6.1) but with the e~s being i.i.d, with common survival function
P ( g n > X) = [1 --ep(X)]
where ep(x) = (1/p) ep(pkx) for the specific choice of p used in the definition (6.1), then (Pillai, 1991) the level crossing processes (6.8) are Markovian with transition matrices now given by P = [1 + ep(t)]
' I p+ep(t) ~(1 -p)ep(t)
1-p 1 1 +pep(t) J
(8.2)
This can be used to obtain the asymptotic distribution of the maximum, M, (as defined in (6.5)). More generally Chrapek et al. (1996) discuss the asymptotic distribution of M~ (~) the kth largest observation in Xo,XI,X2,... ,X~.
8.2. A Markovian variant of the geometric minimization scheme

Suppose that { n}n=-oo is a doubly infinite sequence of i.i.d. Bernoulli (1 - p ) random variables. Two natural sequences of geometric random variables can be constructed from these Bernoulli variables. For t = 0, 1 , 2 , . . . define N+ = 1 if and only if U t = l
and, for i = 2 , 3 , . . . , N + = i if and only if

g t = O, Ut+ 1 ~- 0 , . . . , g t + i _ 2 = O, g t + i _ 1 =
(8.3)
These N+s are geometric (p) random variables with possible values 1,2,.... Nt+ is basically the waiting time until a success in the Un sequence beginning at time t and going forward. If instead we chose to go backward we may define
N t = 1 if and only if
and, for i = 2, 3 , . . .
Ut =
N 7 = i if and only if Ut = 0 , . . . , Ut-i+2 = O, Ut-i+l = 1 .
(8.4)
Now, consider a sequence {-%}n~0 of i.i.d, random vectors with common distribution of the form (3.11) (i.e., k-variate geometric stable with Pareto (III) (0, aj, c~j) marginals). Assume that the {-,}n=0 and the {Un}n~-~ processes are independent. We can now define a forward innovation process by
Pareto processes
29
Xt=(1-p)
-~ rain
i= 1,2,...,Nt+
_et+i_ 1
(8.5)
and a backward innovation process by Xt=(1-p) --~

i - l ,2,. . .,N7
rain
et_i+ 1 .
(8.6)
The two processes (8.5) and (8.6) are time reversals of each other. The backward innovation model (8.6) is essentially identical to the k-variate Arnold process described in (7.17) (they would be identical if the index set of the process (7.17) had been chosen to be n = 0, 4-1, + 2 , . . . instead of 0, 1,2,...). One advantage of the representations (8.5) and (8.6) is that they readily admit simple extensions in which the i.i.d, sequence of Bernoulli variables used in their definition is replaced by a Markovian sequence. The idea was first described in Arnold (1993) in the context of one-dimensional logistic processes but it is readily adapted to the current situation. We will focus on the forward innovation process (8.5), though of course a parallel development could be described for the backward innovation process (8.6). Instead of having {Un} be a sequence of i.i.d. Bernoulli (1 - p ) random variables we assume that {U,} denotes a stationary first order Markov chain with state space {0, 1} and transition matrix ( P = Po 1 - Pl 1-po) Pl (8.7)
where 0 < pO,pl < 1. The long run distribution of this chain is
P(Ui = 0) = (1 - p 0 ) / ( 2 - P 0 - P l ) P(Ui = 1) = (1 - p l ) / ( Z - p 0 - P l )
(8.8)
Now consider a sequence {e,}n~ 1 of i.i.d, innovations assumed to be independent of the Markov process U0, U1,.... For each integer i, let N(i) denote the number of Us, beginning with the ith U, that must be observed until the first instance in which a U is equal to 1 (i.e., until state 1 is visited). Then define
X, =
l<j<N(n)
rain ~-n+j-1
(8.9)
(parallel to (8.5) in the i.i.d, case). This will define a stationary process but, of course, the common distribution of the _e,s must be judiciously chosen to ensure that X__,has the desired multivariate Pareto (III) stationary distribution. Note that the distribution of the N(n)s is known:
P(N(n) = 1) - 2 - p 0 - P l
and
l-p0
P(U(n)=k)=
p1 ~) _2(1 2 l -po_pl/P~ -Pl),
2,3,....
(8.10)
30
B. C. Arnold
By conditioning on N(n) we have

oo
= Z
k=l
oo
P ( X > xlN(n ) = k)P(N(n) = k)
Z I P ( e 0 > x)]kp(X(n) = k)
k=l
1 -p0
P (x) + (1 -p0)(1 - p l )
2 - P0 - Pl (1 - p 0 ~ ( x ) ) "
2 - P0 - Pl - -
(8.11)
In order to have Fx(x) of the form (3.11), we need to solve the quadratic equation (8.11) for ~ ( x ) (only one of the solutions to (8.11) will be a legitimate survival function). In the one-dimensional case Arnold (1993) shows that the c o m m o n distribution of the X~s will be Pareto (III) (#, o-, ~) if we take the c o m m o n distribution of the e~s to be such that
(8.12)
where
Z,=l
1 -P0 2-p0-pl
Un - (1 -p0)(1 - P l ) U~ (2-p0-pl) (1-p0U~)
(8.13)
in which the U~s are i.i.d. ~(0, 1) r a n d o m variables. In that paper, suggestions are given regarding parameter estimation for the process. Extensions to higher order M a r k o v processes are of course possible. Arnold (1993) describes a one-dimensional process involving second order Markovian minimization in some detail.
8.3. Pareto conditionals processes

The arguments usually given to justify the use of processes with Pareto marginal distributions admit relatively minor modifications which lead to consideration of processes with Pareto conditionals instead of Pareto marginals. In the realm of normal processes an assumption of normal marginals or one of normal conditionals are essentially the same. This is of course not true for other distributions. We will briefly describe Pareto conditionals processes in this sub-section. For simplicity we will focus on one-dimensional processes; higher dimensional processes can of course be analogously treated. Our interest is in stochastic processes {Xn}n~0 which have the property that the conditional distribution of Xn given that X,-1 = xn-1 is of the Pareto (II) form with parameters which are functions ofxn_l. In addition, we will require that the processes be stationary. For example, if we require that
X.IXn-1 = x~-i ~ Pareto (II)(]g(Xn-i), a(Xn-1), O:(Xn 1))
Pareto processes
31
for given functions #(.), ~(.) and ~(.) then we have a first order Markov process which, under quite general conditions, will have a well defined stationary distribution. In general, however, it will not be easy to identify this stationary distribution. One case in which the long run distribution can be identified is related to the class of bivariate distributions with Pareto (II) conditionals discussed in Section 3. It may be observed that a joint density of the form
alpha=l, delta=l, gamma=l
alpha=l, delta=2, gamma=l
co
e4
lJlIII
10 20 30 40 50
alpha=l, delta=5, gamma=2
alpha=l, delta=l, gamma=2
co
col
o 10 20 30 40 50 10 20 30 40 50
10
20 30
40
50
Time alpha=2, delta=l, gamma=l
Time
a l p h a = 2 , delta=2, gamma=l
Time
Time alpha=2, delta=l, gamma=2
alpha=2, delta=& gamma=2
co.
,1
cq
10
20 30
40
50
10
20 30
40
50
lo
~o 3'0 ,;o ~'o

Time
10
20 30
40
50
Time alpha=4, delta=l, gamma=l
Time
Time alpha=4, delta=l, gamma=2
alpha=4, delta=2, gamma=l
alpha=4, delta=5, gamma=2
co
co
q,
e3.
04
"
0 10 20 30 40 50 10 20 30 40
II
o 50 0 10 20 30 40 50 0 10 20 30 40 50
Time
Time
Time
Time
Fig. 5. Simulated sample paths of Pareto conditionals processes.
32
B. C. Arnold
f x , Y ( X , y ) (X (1 -}- 7X-~ 7y-~- (~xy)-(c~+I)I(x > 0 , y > 0)
(8.14)
where ? > 0 and 6 > 0, has all o f its conditionals o f X given Y = y and o f Y given X = x o f the Pareto (II) f o r m (cf. Eq. (3.22)). In addition, it has identical marginal distributions for X and Y, namely
f z ( x ) o( (7 + 6x)-1( 1 + ?x) -~
(8.15)
The corresponding conditional distributions are 1-+ Y I X = x ~ Pareto (II) ( 0, 7 - ~7x, x c~,/ (8.16)
Using these observations we m a y readily describe a stationary process with Pareto (II) conditionals as follows. Assume that X0 has density (8.15) and that, for each n, X~IX,_I = x n - 1 ~ P a r e t o (II) 0,
_}_(~Xn_1 , 0 ~
.
Estimation o f the parameters o f this stationary process can be accomplished using the m e t h o d o f m o m e n t s (see A r n o l d et al. (1995) for discussion o f the m o m e n t s o f the densities (8.14) and (8.15)). Some simulated sample paths for realizations o f the process (8.17) are displayed in Figure 5.
References
Arnold, B. C. (1975). Multivariate exponential distributions based on hierarchical successive damage. J. Appl. Probab. 12, 142-147. Arnold, B. C. (1983). Pareto Distributions. International Cooperative Publishing House, FaMand, Maryland. Arnold, B. C. (1987). Bivariate distributions with Pareto conditionals. Statist. Probab. Lett. 5, 263-266. Arnold, B. C. (1989). A logistic process constructed using geometric minimization. Statist. Probab. Lett. 7, 253~57. Arnold, B. C. (1990). A flexible family of multivariate Pareto distributions. J. Statist. Plann. Inf 24, 249-258. Arnold, B. C. (1993). Logistic processes involving Markovian minimization. Comm. Stat&t. Theo. Meth. 22, 1699-1707. Arnold, B. C., E. Castillo and J. M. Sarabia (1992). Conditionally specified distributions. Lecture Notes in Statisties, Vol. 73. Springer Verlag, Berlin. Arnold, B. C. and J. T. Hallett (1989). A characterization of the Pareto process among stationary stochastic processes of the form X, = cmin(X~_l, Y,). Statist. Probab. Lett. 377-380. Arnold, B. C. and L. Laguna (1976). A stochastic mechanism leading to asymptotically Paretian distributions. Business Econ. Statist. Sect. Proc. Amer. Statist. Assoc. 208~10. Arnold, B. C. and C. A. Robertson (1989). Autoregressive logistic processes. Z AppL Probab. 26, 524-531. Arnold, B. C., C. A. Robertson and H. C. Yeh (1986). Some properties of a Pareto type distribution. Sankhya Ser. A 48, 404-408.
Pareto processes
33
Arnold, B. C., J. M. Sarabia and E. Castillo (1995). Distribution with conditionals in the PickandsDeHaan generalized Pareto family. Journal of the Indian Association for Productivity, Quality and Reliability 20, 28-35. Block, H. W. (1975). Physical models leading to multivariate exponential and negative binomial distributions. Modeling and Simulation 6, 445-450. Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods, 2nd Edition. Springer Verlag, New York. Chrapek, M., J. Dudkiewicz and W. Dziubdziela (1996). On the limit distributions of kth order statistics for semi-Pareto processes. Appl. Math. 24, 189-193. Davis, R. A. and S. I. Resnick (1989). Basic properties and prediction of Max-ARMA processes. Adv. Appl. Prob. 21,781-803. Gaver, D. P. and P. A. W. Lewis (1980). First order autoregressive gamma sequences and point processes. Adv. Appl. Prob. 12, 72zP745. Hutchinson, T. P. and C. D. Lai (1990). Continuous Bivariate Distributions Emphasizing Applications. Rurnsby Scientific Publishing, Adelaide, Australia. Lawrence, A. J. and P. A. W. Lewis (1981). A New Autoregressive Time Series Model in Exponential Variables (NEAR(l)). Adv. Appl. Prob. 13, 826 845. Lawrence, A. J. and P. A. W. Lewis (1985). Modelling and residual analysis of nonlinear autoregressive time series in exponential variables. J. R. Statist. Soc. B 47, 162 202. Mardia, K. V. (1962). Multivariate Pareto distributions. Ann. Math. Statist. 33, 1008 1015. Mittnik, S. and S. T. Rachev (1991). Alternative multivariate stable distributions and their applications to financial modeling. In Stable Processes and Related Topics (Eds. S. Cambanis, G. Samorodnitsky and M. S. Taqqu), pp. 107-119. Birkhauser: Boston. Pareto, V. (1897). Cours d'economie Politique, Vol. II. F. Rouge, Lausanne. Pillai, R. N. (1991). Semi-Pareto processes. J. Appl. Prob. 28, 461-465. Rachev, S. T. and S. Resnick (1991). Max-geometric infinite divisibility and stability. Technical Report No. 108, University of California, Santa Barbara. Resnick, S. I. (1987). Extreme Values, Regular Variation and Point Processes. Springer, New York. Strauss, D. J. (1979). Some results on random utility. J. Math. Psychology 20, 35 52. Yeh, H. C., B. C. Arnold and C. A. Robertson (1988). Pareto processes. J. Appl. Probab. 25, 291-301. Yeh-Shu, H. C. (1983). Pareto processes. Ph.D. dissertation, University of California, Riverside.
D. N. Shanbhagand C. R. Rao, eds., Handbook of Statistics, Vol. 19 2001 ElsevierScienceB.V. All rights reserved.
~')
Branching Processes
K. B. A threya and A. N. Vidyashankar
In this survey we give a concise account of the theory of branching processes. We describe the branching process of a single type in discrete time followed by the multitype case. Continuous time branching process of a single type is discussed next followed by branching processes in random environments in discrete time. Finally we deal with branching random walks.
1. Introduction
The subject of branching processes is now over half a century old. The problem of survival of family names in British peerage has already been attempted in the last century by Rev. Watson, although the correct solution by Steffenson appeared only in the 1930s. The subject took off in the late 1940s and 50s with the work of Kolmogorov, Yaglom, and Sevastyanov and their students in Russia; and Harris and Bellman in the United States. Harris's authoritative book [38] appeared in 1960 and stimulated much research on the subject. The book by Mode (1971) on multitype branching processes came out in 1969. Then in 1972 the book by Athreya and Ney (1972) was published. Jagers (1975) wrote a book on branching processes with biological applications in mind. On a more abstract level, the book by Asmussen and Hering (1983) came out in 1982. Seneta and Heyde (1977) wrote a scholarly book in the 70s on the early history of branching processes. The subject of branching processes has had obvious implications for population dynamics, but with the development of computer science it has found new applications in areas such as algorithms, data structures, combinatorics, and molecular biology, especially in molecular DNA sequencing. This led to a conference titled 'Classical and Modern Branching Processes' at the IMA, Minneapolis where new developments were surveyed and open problems identified. The proceedings of the conference were published in 1997 (Athreya). Also in the mid 1980s Dynkin, building up on the earlier work of Fisher and Feller on population genetics and that of the Japanese school of Watanabe, Ikeda and Nagasawa on branching Markov processes, introduced the notion of super 35
36
K. B. Athreya and A. N. Vidyashankar
processes (with deep connections to the theory of partial differential equations) which arose as scaled limits of branching processes that allowed random movement of particles. This has become a major area of contemporary research in probability theory. (see Dawson (1991)). Thus the area of branching processes is alive and well. New applications continue to be discovered and, in turn, inspire new questions for the subject. The goal of the present article is to give a quick and succinct introduction to this exciting area of research. The literature is vast and one has had to make a selection of topics. What is presented here does reflect the authors's interests and preferences. Apart from the books mentioned earlier, we must refer to the work of the Swedish school, led by Jagers, on general branching processes with greater level of dependencies. For an account of this, see Jagers (1991) and the references therein. We also have not dealt with the problems of statistical inference in branching processes. Apart from the book of Guttorp (1991), the work of Dion and Essebbar (1993) with its extensive bibliography is very helpful. We end this introduction with an outline of the rest of the paper. The next section deals with the so-called simple branching process of single type. This is followed by the multitype case. Continuous time branching process of single type is discussed next followed by branching processes in random environments. The final section deals with branching random walks.
2. Branching processes: Single type case

Let {pj : j = 0, 1,2,...} be a probability distribution and {{hi : i = 1 , 2 , . . . , n = 0 , 1 , 2 , . . . } be independent random variables with a common distribution P({11 = j) = Pj, J = 0, 1,2,.... For any nonnegative integer k, let Z0 = k,
k i=1 Z1 i=1
and recursively define z, Z n + , = Z ~ , , i for n : 0 , 1 , 2 , . . . .

i--1
(2.1)
The sequence {Zn}~ is called a branching process with initial population size k and offspring distribution {pj}. The random variable Z~ is to be thought of as the population size of the nth generation and the recursive relation (1) says that Z~+1 is the sum of the offspring of all the individuals of the nth generation. The independence of the offspring sizes 4~,i among themselves and of the past history of the process renders {Zn}~ a Markov chain with state space N + : {0, 1,2,...} and transition probability matrix
pkj = P
~l,i = J
(2.2)
Branching processes
37
where for k = 0 the sum ~ is defined as zero. F r o m n o w on we rule out the deterministic case when pj = I for some j. It is clear that if Z0 = 0 then Zn = 0 for all n > 1, i.e., 0 in an absorbing barrier. The event {Zn = 0 for some n > 0} is called the extinction event. By considering the cases P0 = 0 and p0 > 0 it is easy to conclude that for any initial condition Z0, the events of extinction or explosion, i.e., the event {Zn ---, ec as n ~ oe} are the only two possibilities with positive probability. T h a t is, the p o p u l a t i o n size Zn does not fluctuate for ever. T w o natural questions are: what is the probability of extinction and at w h a t rate does Z~ go to ee on the event of explosion? Since the only data for the p r o b l e m is the offspring distribution {pj}, we seek answers to the above two questions in terms of {pj}. Let
q=P(Zn=O
for some n > _ l l Z 0 = l )
(2.3)
It is easy to see that (2.1) implies the key p r o p e r t y that the lines of descent of distinct individuals are independent. Thus, for k >_ 0,
P(Z~ =
and hence
OO
0 for some n _> l lZ0 = k)
qk
(2.4)
q: P(Z.:o
k=0
O@
for some n _> 1,Zl :
kIZo =
1)
: ZP(Z~
k=0
0<3
: 0 for some n
>
1121 =
k ) P ( Z l = klZ 0
1)
: Z qkpk = f(q)
o
where
f ( s ) = _ Z p j s j,
o
0<s<
(2.5)
is the probability generating function (p.g.f.) of {pj}, also called the offspring p.g.f.. Thus q is the solution of q =f(q) (2.6)
in [0, 1]. It can be shown that q is also the smallest solution of (2.6) in [0, 1]. Since f ( . ) is convex in [0, 1] we have that THEOREM 1. q < 1 if and o n l y i f m = f ' ( 1 - ) > 1. []
The p a r a m e t e r m = f ' ( 1 - ) = ~ jpj is the m e a n of the offspring distribution {pj}. The cases m < 1, m = 1, m > 1 are referred to respectively as subcritieal, critical and supercritical cases. Thus, in the first two cases, the population dies out with probability one and in the last case there is positive probability that the
38
population size goes off to infinity as n increases. One qualitative difference between the subcritical and critical cases is that if T -= min{n : n > 1, Zn = 0} is the extinction time, then the mean value of T, i.e., ET is always finite for m < 1 and can be infinite for m -- 1 even though P(T < ec) = 1 in both cases. (see Seneta (1967) for details). Other differences will become clear later(see for instance Theorem 5 and Theorem 7). We now describe the fundamental limit theorems associated with supercritical branching processes. The first limit theorem describes the behavior of the branching process in the supercritical case. (Kesten and Stigum (1966) and Athreya (1971).) TnEORE~ 2. The sequence
W,, = Zn/m"
(2.7)
is a nonnegative martingale and hence converges with probability 1 (w.p.1) to a limit W. Further, (i)
P(W =
0[Z0 = 1) is one or q according as

Oo
~j(logj)pj=oc
0
or
<oc.
(2.8)
(ii) If ~ j ( l o g j ) p / < oo then E(WIZo = l) = 1 and W has an absolutely continuous distribution with a continuous density on (0, ~ ) . [] For a classical p r o o f of the above theorem see Athreya and Ney (1972); for a more recent conceptualproofsee Lyons et al. (1995) and Kurtz et al. (1997). Since E(Z~+I[Zn) = Znm,iteration of which yields E(Zn[Z0 = 1) = m", Theorem 1 says that Z, grows at the rate of m ", if Z j log jpj < oo, thus confirming the prediction by Malthus of exponential growth of a supercritical population without competition. It turns out that even if Zjlogjpj = oc there always exists a deterministic sequence {cn} of constants such that c~1c~+1--+m and c~lZ~ converges with probability 1 (w.p.1) to a finite r a n d o m variable W that is nontrivial, is, P(W = 01Z0 = 1) = q < 1. The constants cn are called Seneta-Heyde constants and the result was first established by Seneta (1968) and strengthened by Heyde (1970). Athreya [15] showed that if {Z, : n _> 1} and {Z~ : n _> 1} are two independent copies of branching processes, then z~lztn converges to a r a n d o m variable W and if m < eo then P(0 < W < oo) = 1; however, if m = oc then P(W = 0) and P(W = oo) are both positive. This says that in the infinite mean case, a branching process initiated by distinct ancestors could have different growth rates with positive probability. For more complete results consult the works of Grey (1979, 1980), and Schuh and Barbour (1977). It follows from the above theorem that, under a finite mean assumption on the offspring distribution function, Z2IZn+I converges to m(< ec) as n --~ oc w.p. 1 on .rz.+~ n _> 1} is given the set of explosion. A central limit theorem for the sequence LW;-., below (see Athreya (1968) and Heyde (1971)).
Branchingprocesses
THEOREM 3. Assume P0 = 0 and
39
E(Z 2+~) <
oo. Set 0-2 -z E(Z 2) -
m2. Then
where N(0,
0-2) is a
normal r a n d o m variable with mean 0 and variance 0-2.
[]
A law of iterated logarithm associated with the above convergence has been established by Heyde (1971) and some large deviation results are contained in Athreya and Vidyashankar (1993), and Athreya (1994). THEOREM 4. Assume P0 = 0, Pl # 0, and and mrpl > 1. Then
E(Z 2r+~) <
oc for some r _> 1 and 6 > 0
.-+oop~ \l z.
lim--P/li-m
/ Z~+I
>ej
"~
=C(e)=ZP(IXk--mI>e)qk
~>~
where )Tk = ~2~_~Xj where Xjs are i.i.d, as Zl and qk = lim,--+ooP(z~ -k). Furthermore, the limit C(e) is a finite positive constant. [] A number of related large deviation results concerning the rates of convergence of the martingale W~ to W and other refinements of Theorem 4 can be found in Athreya (1994). We now move on to describe the critical branching processes. The first result in this direction describes the behavior of the process conditioned on non-extinction. THEOREM 5. Let m = 1 and and 0 < x < oo
0-2 z ~ o
j2pj __
1 be
finite. For any initial Z0 7{ 0,
lim p(Zn < 0-2 ) ,-~oo \ n - - 2 xlZn > 0
= 1 - - e -x .
(2.9)
[]
Thus, given that the population is not extinct at time n, its size Zn behaves like n times an exponential r a n d o m variable with mean 20--2 . It follows from the above theorem that the sequence tz,'rz"+l:n_> 1} conditioned on non-extinction, converges to 1 in probability as n -+ oc. The large deviations associated with this convergence have been considered by Athreya and Vidyashankar (1997) and is the content of our next theorem concerning critical branching processes. THEOREM 6. Let m = 1 and assume that Then lim
E(Z 2r+~) <
oc for some r > 1 and 6 > 0.
nP( Zn+l _
1 > eIZn > 0) = q(e)
(2.10) []
where 0 < q(e) < co.
Theorems 4 and 6 show that the large deviation decay rate in the supercritical case is geometric while in the critical case it is only algebraic. This difference in the
40
rates can be attributed to the behavior of the generating functions [see Athreya and Vidyashankar (1997)]. We now move on to the subcritical case. In this case we know from Theorem 1 that Z~ ---+0 as n --+ ec. The next theorem describes the behavior of Z~ when one conditions on Z,, > 0 and brings out yet another distinction between subcritical and critical cases. THEOREM 7. Let m < 1. Then for any initial Z0 ~ 0, limP(Z, =jIZ~ > 0) = 7rj exists for all j _> 1 and ~ z j (2.ll)
EZ1 logZ1 < cx~.
= 1; furthermore, ~j_>lj~j < oe if and only if []
The above results describe the probabilistic behavior of the process.The statistical problem of estimating the mean m and other parameters of a supercritical branching process has received considerable attention in the literature; for a classical treatment of the problem refer to the book of Guttorp (1991). More recently, Wei and Winnicki (1990), Winnicki (1991), Sriram et al. (1991), and Datta and Sriram (1995) have investigated the estimation of m from a branching process with immigration. Applications in the context of polymerase chain reaction have been investigated by Sun and Waterman (1997). 3. Branching process: Multitype case Consider now a population with k types of individuals. Assume a type i individual produces children of all types according to a probability distribution {pi(j) : j = (jl,j2,...,jk),jr=O, 1,2,..., r = 1 , 2 , . . . , k } . Assume as before all individuals produce offspring independently of each other and of the past history of the process. Let Z~i be the number of type i individuals in the nth generation. Then the vector Zn = (Z~I, Z , 2 , . . . , Znk) of population sizes in the nth generation evolves by the recursive relation
k z~i Zn+l = Z Z o,,rg(i) i=1 r=l
(3.1)
where {~(~)~: r = 1 , 2 , . . . , n = 0, 1 , 2 , . . . , i = 1 , 2 , . . . , k } are independent random vectors wi'th g(i) ~n~r having distribution pi(.). This renders {Z~}~ a Markov chain with state space (N+) k, the k dimensional nonnegative integer lattice. The state (0, 0 , . . . , 0) is an absorbing state and so if the distributions Pi(.) are not degenerate then the population either dies out or explodes as n --+ oe. As before, (3.1) yields the independence of lines of descent. The key role of the mean m in the single type case is played by the maximal eigenvalue of the mean matrix M -- ((mij)) where
mij = E ( Z l j I Z o = e l ) 1 ~ i, j ~ k ,
(3.2)
Branching processes
41
ez being the unit vector in the ith direction, i.e. ez = (0, 0 , . . . , 1 , . . . , 0), 1 at the ith place and 0 elsewhere. We assume that the process {Zn}~ is irreducible in the sense that for each (i,j), there exists nij such that m(n~:) > 0 where ml~) is the (i,j)th element of the rth power of M. By the Perron-Frobenius theorem it follows that M has a maximal eigenvalue p that a) is strictly positive, b) is of algebraic and geometric multiplicity one, c) admits strictly positive left and right eigenvectors u and v normalized such that u-l=l, u.v=l, Mu=pu, v~M=pv ~ (3.3)
where, ' stands for transpose and stands for the dot product and 1 is the vector with all components equal to one and d) is such that all other eigenvalues 2 of M satisfy ]21 < p. Let
qi=P(Zn=O
for some n > _ l I Z 0 = e i )
(3.4)
be the extinction probability starting with one ancestor of type i and for s = (sl,s2,...,sk) fz(s) = E(s zl/Z0 = ei) where 0 _< sj _< 1 for j = 1 , 2 , . . . , k and s z~ = l ~ l s j f(s) = (fl(s),f2(s),... ,J~(s)) .
Zlj
(3.5) and (3.6)
Then, as in the one type case, it can be shown that q = (ql,q2,... ,qk) is the solution of the equation q = f(q) (3.7)
that is smallest in the sense that if q~ = (q'~,@ . . . , q~) is another solution to (2.7) with 0 _< q~ _< 1 then qz _< ql for all 1 < i < k. The next result is an analogue of Theorem 1 for the multitype case. THEOREM 8. Let {Zn}~ be a k type irreducible Galton-Watson branching process with mean matrix M = ((mzj)). Then qi < 1 for all i = 1 , 2 , . . . , k if and only if p > 1 where p is the Perron-Frobenius eigenvalue of M. [] The cases p < 1, p = 1, p > 1 are referred to respectively as suberitieal, eritieal and supereritieal. The next theorem describes the behavior of the process in the supercritical case (Kesten and Stigum (1966), Athreya (1970), Athreya and Ney (1970), Athreya and Ney (1972)). THEOREM 9. Let u and v be as in (2.3) and p > 1. Then U- Z n _= pn (3.8)
is a nonnegative martingale and hence converges w.p. 1. to a limit W. Further,

P ( W = O[Z0 = el) = qi for all i
42
if and only if
E(ZljlogZljlZo = el) < ec for all i,j

in which case E(WIZ0 = ei) = vi for all i
(3.9)
and W has an absolutely continuous distribution with a continuous strictly positive density on (0, co). [] As in single type case even if (3.9) fails to hold, there always exist Seneta constants cn such that u.zn converges to a nontrivial limit W and Cn+l/Cn Cn converges to p and ,.-ZT,, z, converges to v on the set of non-extinction. This result was established by H o p p e (1976). Thus the relative proportions of the various types stabilizes to a deterministic distribution and the growth rates of all types are identical to the exponential rate of p". In the multitype supercritical case the population vector Z~ is such that u Zn grows at a geometric rate and z~, stabilizes to v on the set of non-extinction. 1. Z. Thus for any vector 1, ~ stabilizes to 1. v. Hence if 1- v 0 then l- Z,, grows at the same rate as u Zn. However, if 1. v -- 0 then the growth rate could be less. One has the following theorem under second m o m e n t hypotheses. THEOREM 10. Let E(Z~jIZo = el) < ec for all i,j. Let 1 be a right eigenvector of M with eigenvalue 2. Then (i) 1212 > p ~ for any Zo ; 0, Vn = kf~ converges to a r.v. V w.p.1 and in mean square. i. z,, d >N(0, o-2) for some 0 < o-2 < oc inde(ii) I,~12< p ~ for any z0 v, v~.z,, pendent of Z0. (iii) 1212 -- p ~ for any Z0 0, I.Z. d N(0, a 2) for some 0 < a 2 < e c independent of Z0. v/u. z,,0nu-z,) [] There are extensions of this result to arbitrary vectors 1 (see Athreya (1969) and Athreya and Ney (1972)). Further limit results for functionals of branching process have been established in Asmussen (1976, 1977, 1978), Asmussen and Keiding (1978), and Asmussen and Kurtz (1980). The large deviation results for multitype branching processes have been considered by Athreya and Vidyashankar (1995, 1997). This leads to interesting questions about the rates of decay of iterates of multi-dimensional generating functions given in (2.6). This is given in the next Theorem. We first define c@ = [0, 1] x ..- x [0, 1], thep-dimensional unit cube and RP+ to be the p-dimensional euclidean space with non-negative components. THEOREM 11. Let the mean matrix M be positively regular and p > 1, and f(0) = 0. Let A ~ ((aij)) be the derivative of f evaluated at 0, i.e.,
= (o) .
Branching processes
43
Assume that there exists 0 < ? < 1 such that 7 "A/7converges to a matrix P0 which is non-zero and has finite entries. Then there exists a map ~ : Cgp __+Rp such that lim fn(s) = ,~(s)
n--+oo ]1/7
(3.10)
where ~ is the unique solution to the vector functional equation
~(f(s)) = 7~(s)
subject to -~(0) = 0,~'(0) = P 0 , 0 < ~(s) < oc, s E Cgp where ~p is the interior of Cgp.
(3.11)
(3.12)
Our next theorem describes the large deviation result for the convergence of (1.Z.)-l(l.g.) to (1.v)-l(l.v).
,7 THEOREM 12. Let p > 1 and L-,',72t"0 * ~ l j + 6 1its0 = ei) < O0 for all i and j and some r0 > 1 such that prO? > land 6 > 0. Then under the conditions of Theorem 11, for any 1
(3.13) exists and is positive and finite. []
As in the single type case further refinements and other large deviation results concerning multitype branching processes have been developed and can be found in Athreya and Vidyashankar (t995, 1997) and Vidyashankar (1994). We now describe the critical case. This result was obtained by Joffe and Spitzer (1967). THEOREM 13. Let p = 1 and u and v be as in (3.3). Let
a2.=E(Z2j[Zo:i)-m
2<00
for a l l i , j .
Then, for any initial Zo # 0, and any w such that w. v > 0 and 0 < x < oo l i m ~ P ( w'Znn
_<xlZn0)=
1 - e -x/'
(3.14) []
where y depends on the variance-covariance matrices and w and u and v.
Once again, conditioned on non-extinction, the population proportions of various types stabilize to u in probability and each of them grow at the algebraic rate of n. The next result describes the behavior of the process in the subcritical case and again is due to Joffe and Spitzer (1967). THEOREM 14. For any initial Z0 0
lilnP(Z,,
= jlZn O) = 7q
(3.15)
44
exists for all Zo 0 .
] = (jl,j2,... ,jk),
j _> 1 and ~ j ~j = 1 and {zcj} is independent of []
4. Continuous time (age dependent) branching processes

The models discussed above treat discrete time branching processes in which an individual lives one unit of time and is replaced by a random number of offspring. There are models in which individual life times are not constants but randomly varying. The key property of independence of lines of descent holds, if the life times and offspring size of different individuals are independent. We now describe such a model for the single type case and some results for that model as well. Let {pj}~ be a probability distribution on N + = {0, 1,2,...} and G(.) be a probability distribution function on R such that G(0) = 0. Assume that each individual lives a random length of time L with distribution function G(.) and on death creates a random number of children and that ~ and L are independent as well. Assume also that (4, L) of different individuals are independent and identically distributed. At time t > 0, let Z(t) be the total number of individuals alive and Yt-= {Xl,X2,...,Xz(0} be the set of all ages of those alive. The process {Z(t) : t _> 0} is called an age dependent branching process (also called BellmanHarris process). It is integer valued and not Markov unless G(.) is exponential. The process {Yt : t _> 0}, however, is a Markov process with the set of finite subsets of [0, oe) for state space. A key tool in the study of the process {Z(t) : t > 0} is the integral equation satisfied by the probability generating function of Z(t),
F(s,t) =_E(sZ(t)lYo
{0}) ,
(4.1)
starting with a just born ancestor. Let f(s) = x-'~p.s Z-~0 J j be the probability generating function of ~. Then it follows by averaging over the distribution of (4, L) of the ancestor that
F(s,t)
= s(1 -
G(t) + [
J(0 ,t)
f[_F(s, t
y)]dG(t)
(4.2)
(For the discrete time case the above reduces to fn(s), the nth iterate o f f for the choice t = n). Since re(t) -EZ(t) = F'(1, t) where ' denotes differentiation with respect to s we get from (4.2)
m(t)
= (1 -
G(t)) + m [ m(t- u)dG(u) J(o,t)
(4.3)
where m = f/(1) = Zjpj is the offspring mean. As in Galton Watson case the key parameter is m. By looking at the sequence {(n} of generation sizes one arrives at
Branching processes
45
THEOREM 15. The extinction probability
q=P(Z(t)=O
is=lifm_< 1 and<
for some t _ > 0 1 Y 0 = { 0 } ) 1. []
lifm>
The limit theorems for Z(t) as t --+ ec in the three cases m > 1, = 1, and <1 analogous to theorems 2, 3 and 4 of the discrete case will be presented below. In the case m > 1 the effect of r a n d o m life times is expressed through the Malthusian parameter ~ defined by mf e -at dG(t) = 1 ; a(0 ,e~) (4.4)
since 1 < m < oc, such an a exists and is unique in (0, oc). The reproductive age value V(x) is
-1
V(x)-m(l-G(x)f(x,~)e-~tdG(t) )
for O<x_<T-inf{x:G(x)=l} It can be shown that .
z(t) W(t) =--e-~t E V(x,)

i--1
(4.5)
where Yt = {xl,x2,... ,Xz(t)} is the age distribution at time t, is a nonnegative martingale. Also the empirical age distribution
A(x, t) = { i :
z(t)
Yi _< x}
(4.6)
converges in distribution as t --+ ec w.p. 1. to the stable age distribution [Athreya and K a p l a n (1976)]
A(x) = fo
fo e ~u(1 G(u))du e-~u( 1 - G ( ~

-
(4.7)
THEOREM 16. Let m > 1 and W(.) be as in (4.5). Then {W(t) : t > 0} is a nonnegative martingale and converges w.p. 1. to a limit W and
P(W =
0]Z0 -- 1)
is one or q according as implies that
Ej(logj)pj
= ec or < oe. Further,
Xj(logj)pj < oc
E ( w l r 0 = {x}) -- v(x)
46
and
Z(t)e -~t
converges w.p.1, to
(j0
-1
[]
The existence of Seneta constants for age-dependent branching process has been obtained by Schuh (1982). Some large deviation results for the convergence of Z71Zt+r for some r > 0 has been established in Vidyashankar (1994). Turning now to the critical case, the following analogue of the discrete time result was established by Goldstein (1971). THEOREM 17. Let m = 1 and a 2 = E(j - 1)2pj < oc. Assume that limt~o~ t 2 (1 - G(t)) = 0 and f o tdG(t) = p. Then, for any initial Z0 0, t~oo
l i m P ( Z(t)<xlZ(t)>O']
\ t -
=l-e-@
~,
O<x<oc
(4.8) []
In the critical case, the empirical age distribution A (x, t) conditioned on nonextinction converges to A(x), i.e.,
t---cOO
lim P(sup
X
IA(x, t) -A(x)l
>
clZ(t ) > O) = 0
where A(-, t) is as in (4.6) and A(-) is as in (4.7) with ~ = 0. For the subcritical case (m < 1) Ryan (1976) established the following result. THEOREM 18. Let m < 1. Then for any initial
t---*O0
Zo O, G(.)
nonlattice
lira
P(Z(t) = j[Z(t) > O) = ~j

rcj = 1. []
exists for all j >_ 1 and ~
5. Branching processes in random environments The models discussed so far assumed that the offspring probability distribution did not change with time. N o w we consider a model in which the offspring probability distribution is randomly chosen in each generation. Let M be the collection of probability distributions on nonnegative integers. Let ( - {~i}~ be a sequence of M valued r a n d o m variables. Let {Zn}~ be a sequence of random variables generated by the recursive relation z,,
z.+l = Z
j=l
.,J
(5.1)
where conditioned on Z0, Z1,. , Zn, {(i}~, the random variables {(nj : J = 1 , 2 , . . . } are i.i.d, according to (n. The sequence {Zn}~ is called a
Branching processes
47
branching process with a random environmental sequence ~ = {~i}~ or simply B.P.R.E. Let
q(~) = P(Zn = 0 for some n _> l lZ0 = 1, ~)

be the extinction probability of the process {Zn} conditioned on the environmental sequence 3. It follows by conditioning on Z1 that
q(~) = 0~o (q(T~))
(5.2)
where q50 (.) is the probability generating function of the probability distribution ~0 and T~ is the shifted environmental sequence {~i}~. If the sequence ~ is stationary and ergodic then it can be shown that P(q(~) = 1) is zero or one [see Athreya and Karlin (1971a, b) and Tanny (1977)]. The criterion for almost sure extinction can be expressed in terms of the behavior of the conditional mean of Z, given Z0 = 1 and ~. From (5.2) it follows that
E(Zn+l INn, ~) = Zn~n (1)

where 0~ (') is the derivative of ~b~.(.). This renders W~ -- (I~---1 qS},(1)) -- P,
(5.3)
an
z.
(say)
(5.4)
a nonnegative martingale and hence it converges w.p.1. The product n-1 t t P. = 1~;=0 ~b~i(1) has an asymptotic behavior determined by E(ln ~b0(1)) since by the ergodic theorem =n n
lnq~ (1)
converges w.p.1, to E(ln qS}0 (1)). It is not very surprising that the following holds. THEOREM 19.
P(q() = 1) = 1 if E(lnqS{0(1))+ < oc and

_< _< o o . []
The B.P.R.E. {Zn}~ is called supercritical, critical and subcritical according as Eln~b}o(1 ) > , = or < 0. Limit theorems for supercritical branching processes in random environments have been investigated by Athreya and Karlin (1971a, b) and Tanny (1977). Supercritical multitype BPRE has been investigated by Cohn (1989).
48
K. B. Athreyaand A. N. Vidyashankar
6. Branching random walks The models described so far do not include any spatial components in the process. We will now describe a process that does include a spatial component. The process starts with a single initial ancestor at the origin. After one unit of time, the ancestor produces offspring whose positions form a point process Z0) on R. Each of these children in turn live one unit of time and produce offspring whose positions relative to their parent are given by an independent copy of Z (1), i.e., the positions of children of a parent living at x is given by x + Y where Y is an independent copy of Z0). The positions of the second generation individuals form a point process Z (2). Subsequent generations are formed similarly, yielding Z (n) as the nth generation point process. The total number of offspring in the nth generation will be denoted by IZ(~)[. The process {Z(~) : n _> 0} is called a branching random walk (BRW). The population size process {IZ(") ] : n _> 0} forms a singletype branching process. Let {zr,, : 1 < r < Iz(~)l} denote the enumeration of the positions of the nth generation population. The description of the process given above implies that ]z(n)[ z(n+I)(A) = Z Z,!~(A
r:l
Zr,n)
(6.1)
where Z!,) are independent copies of Z (1), with Z,~ giving the relative positions of the offspring of Zr,n. Let m=E(IZO)]). We will assume that m > 1 so that IZ(n)I ---+oc with positive probability. Let q be the probability of extinction of the branching process {[Z(")[ : n > 0}. A key role in the analysis of branching random walk is played by the Laplace transform of the intensity measure #, i.e. #(A) = E(Z(1)(A)[Z() = 60) of the point process Z (1). Using the independence of lines of decent one can show that
E(Z (")(A)iN(0))
(A)
(6.2)
where #n* is the n-fold convolution of # with itself. Let
m(O) = fR eX'u(dx) "

Note that m(O)= It follows that
E ( / eXZO)(dx)lZ() = (5o).
Branching processes
49
If we let
f
(m(O))n
(6.3)
then for each O, {Wn(0) : n > 0}, is a martingale sequence, which for 0 = 0 reduces to the martingale/16,, of section 1. Next, let
J(x, O) = Ox - logm(O)
and g(a) = {0: J(x,O) < a} . THEOREM 20. Assume that m > 1 and re(O)< ec for all 0. The sequence {W~(0) :n > 1} is a nonnegative martingale sequence and hence converges w.p.1 to a random variable W(O). Furthermore, for 0 ~ E(0)
P(W(O) = OIZ() = 30) = 1 or q

according as
E(WI(O)logWI(O)IZ () = 60) = oc
or
< oc .
[]
The details of proof can be found in Biggins (1977) where the assumption
m(O) < ec for all 0 is considerably reduced. A conceptual proof of the above
result along the lines of the one in Lyons et al. (1995) has been established by Lyons (1997). Biggins (1992) also considers uniform convergence of Wn(0) to W(O) for 0 in some compact set. Biggins and Kyprianou (1997) consider the Seneta constants for BRW. The particular case of BRW, namely the mixed sample case, (a point process with i.i.d, components) has been studied by Athreya (1985), Asmussen and Kaplan (1976a, b) and Ney (1966). In the mixed sample case, if one were to denote by
a(O) = E(e Ozl,l )

then one can see that W~(0) reduces to
1 Iz(nll W~(O) - m"(a(O))" Z e....

r=l
Recently, Joffe (1993) noticed that if one replaces mn by Zn in the above, viz., if 1 Iz(")l
V~(O) - iZ(n)l(a(O)) ~ ~
r=l
e z',
(6.4)
50
then V~(0) continues to be a martingale with respect to the same filtration Y , = a ( Z 0 ) , . . . , Z(~)}. He further investigated the non-triviality o f the limit V(O) o f the non-negative martingale sequence { V~(0) : n > 1} under a second m o m e n t assumption on the offspring distribution function. Several other functionals o f B R W have been investigated. The two i m p o r t a n t functionals that have received m u c h attention were L~ = inf{zr,. : 1 < r < IZ(")I} and
: sup{zr, : 1 < r < IZ< >I} .
The behavior o f L, and U, has been investigated in Biggins (1977). The other functional o f interest is
Z(n)(nA) = #{Zr,nlZr,, E hA}
as n ~ oc. Biggins (1979) has shown that Z(n)(nA) scaled by its expectation converges to a non-trivial limit as n--+ ec. Extensions o f the above result to multitype B R W has been carried out by B r a m s o n et al. (1992) and to B R W in r a n d o m environments by Vidyashankar (1994). Large deviations for branching M a r k o v chains with discrete and general state spaces have been investigated in A t h r e y a and K a n g (1998a, b).
Acknowledgements
The a u t h o r would like to thank the a n o n y m o u s referee for a t h o r o u g h reading o f the manuscript and for several useful suggestions that led to the improvement o f the paper.
References
Asmussen, S. (1976). Convergence rates for branching processes. Ann. Probab. 4, 139-146. Asmussen, S. (1977). Almost sure behavior of linear functionals of supercritical branching processes. Trans. Amer. Math. Soc. 1, 233-248. Asmussen, S. (1978). Some martingale methods in the limit theory of supercritical branching processes. Branching Processes, Advances in probability and related topics, 5, 1-26, Marcel Dekker, New York. Asmussen, S. and H. Hering (1983). Branching Processes, Birkhauser, Boston. Asmussen, S. and N. Kaplan (1976a). Branching Random walks I. Stoch. proc. Appl. 4(1), 1-13. Asmussen, S. and N. Kaplan (1976b). Branching Random walks II. Stoch. proc. Appl. 4(1), 15-31. Asmussen, S. and N. Keiding (1978). Martingale central limit theorems and asymptotic estimation theory for multitype branching processes. Adv. Appl. Probab. 10(1), 109-129. Asmussen, S. and T. Kurtz (1980). Necessary and sufficientconditions for complete convergence in the law of large numbers. Ann. Probab. 8(1), 176-182.
Branching processes
51
Athreya, K. B. (1968). Some results on multitype continuous time Markov branching processes. Ann. Math. Statist. 20, 649-651. Athreya, K. B. (1969). Limit theorems for multitype continuous time Markov branching processes II the case of an arbitrary linear functional. Z. Wahrsc. Verw. Gebiete, 13, 204-214. Athreya, K. B. (1971). On the absolute continuity of the limit random variable in the supercritical Galton-Watson branching processes. Proc. Amer. Math. Soc. 30, 563 565. Athreya, K. B. (1970). A simple proof of a result of Kesten and Stigum on supercritical multitype Galton-Watson branching process. Ann. Math. Statist. 41, 195502. Athreya, K. B. (1985). Discounted branching random walks. Adv. Appl. Probab. 17(1), 53 66. Athreya, K. B. (1994). Large deviation rates for branching processes-l, the single type case. Ann. Appl. Probab. 4(3), 779-790. Athreya, K. B. (1999). Growth rates of lines of descent in branching process may differ. Preprint M9912, Department of Mathematics, Iowa State University. Athreya, K. B. and P. Jagers (1997). Classical and modern branching processes. Papers from IMA workshop held at the University of Minnesota, Minneapolis, MN, June 13-17, 1994, IMA volumes in Mathematics and Applications, 84, Springer-Verlag, New York. Athreya, K. B. and H. Kang (1998a). Some limit theorems for positive recurrent branching Markov chains, I. Adv. Appl. Probab. 30, 693~10. Athreya, K. B. and H. Kang (1998b). Some limit theorems for positive recurrent branching Markov chains, II. Adv. Appl. Probab. 30, 711-722. Athreya, K. B. and N. Kaplan (1976). Convergence of age distribution in the one-dimensional supercritical age dependent branching process. Ann. Probab. 42(1), 38-50. Athreya, K. B. and S. Karlin (1971a). On branching processes with random environments, I Extinction Probabilities. Ann. Math. Statist. 42, 1499-1520. Athreya, K. B. and S. Karlin (1971b). On branching processes with random environments, II - Limit Theorems. Ann. Math. Statist. 42, 1843-1858. Athreya, K. B. and P. Ney (1970). The local limit theorem and some related aspects of supercritical branching processes. Trans. Amer. Math. 152, 233-251. Athreya, K. B. and P. Ney (1972). Branching Processes. Springer-Verlag, New York. Athreya, K. B. and A. N. Vidyashankar (1993). Large deviation results for branching processes, Stochastic Processes. A Fetschrift in honour of G. Kallianpur, 7-12, Springer, New York. Athreya, K. B. and A. N. Vidyashankar (1995). Large deviation rates for branching processes II - the multitype case. Ann. Appl. Probab. 5(2), 566-576. Athreya, K. B. and A. N. Vidyashankar (1997). Large deviation rates for critical and supercritical branching processes, Classical and Modern Branching Processes. 1-18, IMA volumes in Mathematics and its Applications, Vol. 84, Springer, New York. Biggins, J. D. (1977). Chernoff's theorem in the branching random walk. J. Appl. Probab. 14(3), 63C~636. Biggins, J. D. (1977). Martingale convergence in the branching random walk. J. Appl. Probab. 1, 25 37. Biggins, J. D. (1979). Growth rates in the branching random walk. Z. Wahrsch. Verw. Gebiete 48(1),
17-34.
Biggins, J. D. (1992). Uniform convergence of martingales in the branching random walk. Ann. Probab. 20(1), 137-151. Biggins, J. D. and A. E. Kyprianou (1997). A Seneta-Heyde norming in the branching random walk. Ann. Probab. 25(1), 337-360. Bramson, M., P. Ney and J. Tao (1992). The population composition of a multitype branching random walk. Ann. Appl. Probab. 3, 575-596. Cohn, H. (1989). On the growth of multitype supercritical branching processes in a random environment. Ann. Probab. 17(3), 1118-1123. Dawson, D. A. (1991). Measure-valued Markov Processes l~cole d'l~t6 de Probabiliti6s de Saint-Flour XXI, Lecture Notes in Mathematics, 1541, Springer-Verlag, Berlin. Dion, Jean-Pierre and B. Essebbar (1993). On the statistics of controlled branching processes Branching Processes, 14-21, Lecture Notes in Statistics, 99, Springer, New York.
52
Datta, S. and T. N. Sriram (1995). A modified bootstrap for branching processes with immigration. Stoch. Proe. Appl. 56(2), 275394. Goldstein, M. I. (1971). Critical age-dependent branching processes-Single and Multitype, Z. Wahrsch. Verw. Gebiete, 17, 74-88. Grey, D. R. (1979). On regular branching processes with infinite mean. Stoch. Proc. Appl. 8, 257-267. Grey, D. R. (1980). A new look at convergence of branching processes. Ann. Probab. 8, 37%380. Guttorp, P. (1991). Statistical Inference for Branching Processes, John Wiley and Sons, New York. Harris, T. E. (1963). The Theory of Branching Processes, Springer, Berlin. Heyde, C. C. (1970). Extension of a result of Seneta for supercritical Galton-Watson Processes. Ann. Math. Statist. 41, 739-742. Heyde, C. C. (197l). Some central-limit analogues for supercritical branching processes. J. Appl. Probab. 8, 52-59. Heyde, C. C. (1971). Some almost sure convergence theorems for supercritical branching processes, Z. Wahrsc. Verw. Gebiete, 20, 189-192. Hoppe, F. M. (1976). Supercritical multitype branching processes. Ann. Probab. 4(3), 393-401. Jagers, P. (1975). Branching processes with biological applications. Wiley Interscience, New York. Jagers, P. (1991). The growth and stabilization of populations. Statist. Sci. 6(3), 269383. Joffe, A. (1993). A new martingale in the branching random walk. Ann. Appl. Probab. 3(4), 1145 i150. Joffe, A. and F. Spitzer (1967). On multitype branching processes with p <_ 1. J. Math. Anal. Appl. 19, 409-430. Kesten, H. and B. P. Stigum (1966). A limit theorem for multidimensional Galton-Watson processes. Ann. Math. Statist. 37, 1211-1223. Kurtz, T., R. Lyons, R. Pemantle and Y. Peres (1997). A conceptual proof of Kesten-Stigum theorem for multi-type branching processes, Classical and Modern Branching Processes. 181-186, IMA Volumes in Mathematics and its Applications, Vol. 84, Springer, New York. Lyons, R., R. Pemantle and Y. Peres (1995). Conceptual proofs of LlogL criteria for mean behavior of branching processes. Ann. Probab. 23, 1125 1138. Lyons, R. (1997). A simple path to Biggins' martingale convergence for branching random walk. 217 222, Classical and Modern Branching Processes, 181-186, IMA Volumes in Mathematics and its Applications, Vol. 84, Springer, New York. Mode, C. J, (1971). Multitype branching processes: Theory and applications, American Elsevier Publishing Co. Inc, New York. Ney, P. (1966). The convergence of a random distribution associated with branching process. J. Math. Anal. Appl. 12, 316-327. Ryan, T. A. Jr (1976). A multidimensional renewal theorem. Ann. Probab. 4, 661-665. Schuh, H.-J. and A. D. Barbour (1977). On asymptotic behaviour of branching processes with infinite mean. Adv. Appl. Probab. 9, 681-723. Schuh, H.-J. (1982). Seneta constants for the supercritical Bellman Harris process. Adv. Appl. Probab. 14, 73~751. Seneta, E. (1967). The Galton-Watson process with mean one. J. Appl. Probab. 4, 489-495. Seneta, E. (1968). On recent theorems concerning supercritical Galton-Watson processes. Ann. Math. Statist. 39, 2098-2102. Seneta, E. and C. C. Heyde (1977). I.G. Bienayme: Statistical Theory anticipated. Springer-Verlag, New York. Sriram, T. N., I. V. Basawa and R. M. Huggins (1991). Sequential estimation for branching processes with immigration. Ann. Statist. 19(4), 2232-2243. Sun, F. and M. S. Waterman (1997). Whole genome amplification and branching processes. Adv. Appl. Probab. 29(3), 629-688. Tanny, D. (1977). Limit theorems for branching processes in random environment. Ann. Probab. 5(1), 100-116.
Branching processes
53
Vidyashankar, A. N. (1994). Large deviations for branching processes in fixed and random environments. Ph.D. Thesis, Departments of Mathematics and Statistics, Iowa State University, Ames, IA 50011. Wei, C. Z. and J. Winnicki (1990). Estimation of means in the branching process with immigration. Ann. Statist. 18(4), 1757 1773. Winnicki, J. (1991). Estimation of variances in the branching process with immigration. Probab. Theory Relat. Fields 88(1), 77-106.
,..Y
Inference in Stochastic Processes
L V. Basawa
Large sample properties of estimators and test statistics based on observations from stochastic processes are reviewed. The local asymptotic normality (LAN) is used as a unifying framework. Optimum estimating functions, adaptive estimation for semiparametric models and Bayesian methods are also discussed briefly. Several examples from stochastic processes are presented to illustrate the theory.
1. Introduction
This paper reviews large sample properties of estimators and test statistics based on observations from discrete time stochastic processes. Even though similar results can be obtained for continuous time processes, we limit ourselves to the discrete time for the ease of presentation. The local asymptotic normality (and mixed normality) is used to unify diverse asymptotic results and efficiency properties. Billingsley (1961), Basawa and Prakasa Rao (1980a) and Basawa and Scott (1983) may be consulted for background material for inference in stochastic processes. LeCam (1986) and LeCam and Yang (1990) discuss local asymptotic normality and its applications in a general setting. If the likelihood function is not known, one can use the theory of optimal estimating functions instead of the likelihood based methods. See Godambe (1991) and Heyde (1997) for the method of estimating functions and its applications. Section 2 is concerned with likelihood based methods. These include the local asymptotic normality, asymptotic efficiency of the maximum likelihood estimator, efficient tests of both simple and composite hypotheses, and extensions to local asymptotic mixed normality. Optimal estimating functions are introduced in Section 3. Semiparametric models and adaptive estimation are discussed in Section 4. Section 5 reviews Bayes and empirical Bayes methods. Specific applications are discussed in Section 6. See also Basawa and Prakasa Rao (1980b) and Basawa (1983, 1990) for previous reviews on the topic. 55
56
2. Likelihood methods
L F\ Basawa
2.1. The basic framework

Let {Xt}, t = 0 , - 4 - 1 , 2 , . . . , denote a discrete time stochastic process defined on a probability space (Z, Y , Po) where ;~ is the sample space not depending on 0, ~ , the corresponding Borel a-field and Po a probability measure indexed by a (k x 1) vector parameter 0 taking values in an open set ~2 c Nk. Suppose X(n) = (X1,...,Xn) is a vector of n observations defined on the space (Xn, f n , Pn,o). Let pn (x(n); 0) denote the probability density corresponding to Pn,o defined with respect to an appropriate measure #,. It is assumed that the probability measures {Pn,o,0 f2} are restrictions of Po on ~ n , and they are mutually absolutely continuous. Consider the log-likelihood ratio of 0 to 00,
An(O, 00) = log{pn(X(n); O)/pn(X(n); 00)} ,
(2.1)
where 00 is a fixed parameter value and 0 ranges in f2. The asymptotic properties of likelihood based estimators and tests are related crucially to the limiting behaviour of A,(O, 00) for values of 0 close to 00. Define a neighborhood Nn(O0) of 00 by Nn(Oo) = {0 : ICed(00)(0 - 00)l _< 6}, 6 > 0, where ]al, for any column vector a, denotes the vector n o r m (aTa) 1/2, and C,(00) is a (k x k) positive definite symmetric matrix (non-random) such that tr(C~(Oo)Cn(Oo)} ---+~ as n ~ ~ . We shall first consider a quadratic approximation for An(O, 00) for 0 Nn(Oo). See the end of Section 2.3 for the various choices of Cn(0o).
2.2. A quadratic approximation for the log-likelihood ratio

Suppose that for any 0 Nn(Oo), there exists a r a n d o m (k x 1) vector Sn(00), and a r a n d o m (k x k) (almost surely) positive definite symmetric matrix Fn(00) such that under P00-probability,
An(O, 00) = (0 - 00)Tsn(00) -- (0 -- 00)TFn(00)(0 -- 00) + Op(1) ,
(2.2)
where Op(1) denotes terms that converge to zero as n -+ ~ in P00-probability. The quadratic approximation in (2.2) will be used repeatedly in what follows. Denote Qn(0, 00) = (0 - 00)Tsn(00) -- (0 -- 00)TFn(00)(0 -- 00) Taking vector derivatives with respect to 0, we have Q'n(0, 00) = Sn(00) - r , ( 0 0 ) ( 0 - 00) , and Q2(0, 00) = -Fn(00) . Consequently, the value o f 0 that maximizes Qn(0, 00) is given by 00 = 0o + F~-1 (0o)Sn(0o) . (2.6) (2.5) (2.4) (2.3)
Inference in stochastic processes
57
Substituting (2.6) in (2.3) we have Qn(00, 00) = gS, 1 T(00)F, -1 (00)Sn(00) . (2.7)
The maximizer 00 and the maximum value Qn(00,00) of Q~(0,00) play an important role in obtaining efficient estimators and efficient tests respectively. It may be noted that in most of the applications the approximation in (2.2) can be verified via the Taylor expansion with Sn(0) = d logp,(X(n); 0) dO ' and Fn(0) = - d 2 logp,(X(n); 0) dO dOT ' (2.8)
the score vector and the sample Fisher information matrix, respectively. From now on, unless otherwise stated, S,(0) and Fn(0) are chosen as in (2.8). See Basawa and Koul (1988) for further results on quadratic approximation in a more general setting.
2.3. The Local Asymptotic Normality (LAN)

Suppose first that the quadratic approximation for An(O, Oo) in (2.2) is valid. Further, assume the following conditions, (2.9) and (2.10), are satisfied. There exists a positive definite symmetric non-random matrix F(00) such that
Cn 1(00)rn (00)Cn 1(00) = r(00) Jr op(l) ,

and
(2.9)
d Nk(O , F(00) ) Cn 1(00)S n (00) -------+
(2.10)
both under P00-probability. The family of probability measures {Pn,o} is said to satisfy the local asymptotic normality (LAN) property in Nn (00) provided (2.2), (2.9) and (2.10) are satisfied. Under the LAN assumption we have, with 0n = 00 + Cnl(00)h, where h is a (k x 1) vector of real numbers,
An(On, 00) d> N(_hTF(00)h, hTF(00)h) ,
(2.11)
as n -+ oc, under P0o-probability. The LAN property provides a powerful tool in obtaining efficient estimators and tests as will be seen in the next two subsections. The normalizing sequence of matrices {Cn (00)} can be chosen in a number of ways. More common choices are as follows: (i) Take Cn(00) --- nl/2I, where I is the identity matrix. This choice is appropriate typically when the process {Xt} is stationary and ergodic. In particular, when {Xt} is a sequence of independent and identically distributed random variables, the above choice leads to the classical large sample theory.
58
L K Basawa
(ii) For nonstationary processes, it is often convenient to take
Cn(Oo)= [diag{Eoo(-~21npn.~ ..
(-~21npn~
l/2
Typically, in problems related to regression models the above choice is quite common in the literature. (iii) A more general choice for Cn(00) is Cn(00) = (EooF,(Oo))1/2. With this choice, for ergodic type models typically, F(00) in (2.9) to (2.11) is replaced by the identity matrix.
2.4. Efficient estimation

A sequence of estimators {Tn} of 0 is said to be regular asymptotically normal if
C.(00)(r. - 0n) ---+Nk(0, d Vr(00)),
under P~,0-probability ,
(2.12)
where 0n = 00 + C~-l(00)h, and VT(00) is a positive definite matrix. The LAN property enables one to construct efficient estimators in the class of regular asymptotically normal estimators satisfying (2.12). Several estimators, such as moment, least squares and conditional least squares estimators satisfy (2.12). It can be shown (see, for instance, Hall and Mathiason (1990)) that when the LAN property is satisfied, then for any Tn satisfying (2.12), we have Vr(00) _> F-l(00) , (2.13)
in the sense that the difference is a non-negative definite matrix. The inequality in (2.13) is the asymptotic analogue of the usual Cram~r-Rao inequality for the variance of an unbiased estimator based on a finite sample. Note that the regularity assumption in (2.12) requires the asymptotic normality of Tn under pn,o_ probability. The verification of (2.12) can be simplified as follows. Suppose first that {Tn} satisfies (Vr
~T,S
6~,s ) )
~
under P00-probability (2.14)
where Z~ = Cn(00)(Tn - 00) and An = C21(00)S~(00). It can then be shown (see Hall and Mathiason (1990)) that, under LAN, any Tn satisfying (2.14) satisfies (2.12) if and only if 6T,S = I, the (k x k) identity matrix. Consequently, in order to verify (2.12) it suffices to show that
;)), under 0o obabil y

Note that under appropriate regularity conditions regarding differentiation under the integral sign, the requirement 6r,s = I implies that T, is asymptotically
Inference instochasticprocesses
59
unbiased for 0. Suppose that Tn satisfies (2.15). Then (2.13) follows readily. To see this consider Yn = Zn - F-1An. From (2.15) it follows that, under Poo-probability, Yn--~Nk(0, V r - F
d
-1) .
(2.16)
Since Vr - F -1 is a covariance matrix, the result in (2.13) follows. A regular estimator is asymptotically efficient if the equality in (2.13) holds. We now consider the problem of constructing efficient estimators which attain the equality in (2.13). First, consider the following assumption on the score function S~(0). Suppose
Sn(0) = Sn(0o) -
Cn(Oo)FCn(Oo)(O -
00) + Op(1),
under P0o-probability,
(2.17) uniformly in 0 C N~(Oo). The requirement in (2.17) can usually be verified by a Taylor expansion of Sn(0) at 0o, and a strengthened version of (2.9), viz.,
cnl(Oo)Fn(O)Cnl(Oo)
= F ( 0 0 ) + op(1),
under P0o-probability , (2.18)
uniformly in 0 E Nn(Oo). Now consider the estimator 0n defined by 0n = 00 + F21(00)S,(00) , where 00 is any preliminary estimator of 0 such that C,(00)(00 - 00) = @(1), under P00-probability , (2.20) (2.19)
where Op(1) denotes terms bounded in probability. Note that (2.19) is obtained from (2.6) by replacing 00 by 00. Under LAN, and (2.17), one can verify that 0n is asymptotically efficient. First, we shall show that Cn(0o)(0n - 0o) d N~(0, F -1(0o)), We have Cn(0o)(0n - 0o) = Cn(00)(0o - 00) + Cn(0o)F2a (0o)Sn(00), by (2.19)
= cn(oo)(0o - o0)
under P0o-probability .
(2.21)
+c.(Oo)r21(0o)
+Op(1) ,
{Sn(0o) - Cn(00)FCn(0o)(0o - 0o) +Op(1)}, by (2.17)
= F l(o0)Cnl(Oo)Sn(O0)
(2.22)
using (2.18). The result in (2.21) then finally follows from (2.22) and (2.10). It can further be shown (see, for instance, Hall and Mathiason (1990)) that under LAN, 0n satisfies (2.12) with V r ( 0 0 ) = F 1(00). Consequently, 0n given by (2.19) is asymptotically efficient. Note that 0, is the usual one-step solution of the likelihood equation Sn (0) = 0, and it requires a preliminary estimator. Typically, moment and least squares estimators can be chosen as preliminary estimators.
60
L ~Basawa
2.5. Efficient tests: Simple hypotheses

First, consider the problem of testing a simple hypothesis H : 0 = 00, against a simple alternative K : 0 = 01. By the N e y m a n - P e a r s o n lemma, the most powerful test is given by
1, An(01,0o) >_ kn
~n = O, An(Ol,0o) < kn '
(2.23)
where the constant kn is chosen so that EH4n = en, 0 < n < 1. Here q~, takes the value 1 for the rejection of H and the value 0 for the acceptance. Let qS~ be any other test function such that EH~b,] = ~n. It then follows by the N e y m a n Pearson lemma that re,,(01) > re;(01) , (2.24)
where f~(01) denotes the power of a test ~b at 01, i.e. fe(01) = ExO. A test ~b, is said to be consistent for testing H against K, if re.(01) -+ 1 as n ~ ec. Several reasonable tests satisfy the consistency requirement. In order to discriminate between consistent tests one may look at the limiting power at a sequence of alternatives K , : O = 0n, 0n = 00 + Cn(00)h, rather than at a fixed alternative 0 = 01. F r o m (2.24), with 01 replaced by 0n, we have lira s u p { f e; (0n)} < lira fie, (0n) , (2.25)
where the right hand limit in (2.25) can be evaluated under LAN. N o t e that, under LAN, the limit distribution of A,(On, 00) under H is given by (2.11), viz.,
An(On, 00) d N(_z2 ' .[.2)
(2.26)
where "C2 = h T F ( 0 0 ) ] l . Letting ~n --+ c~, we find by (2.26) that kn in (2.23) converges to k = "CZl_~- "c2/2, where ~b(Zl_~) = 1 - c~, and ~b(x) denotes the distribution function of a standard normal r a n d o m variable. It can be shown (see, for instance, Basawa and Scott (1983)), under LAN, that
An(On, Oo) ~ N @ 2 / 2 , ' 2 ) ,

We have
under Kn .
(2.27)
f+.(o.) = P.,oo(A. >_ k.)

1 - - ~ ( Z 1 c~ - - ~C) . (2.28)
The result in (2.28) follows from (2.27), and the fact that k, ~ zzl_~ - ~2/2. We finally obtain (under L A N ) the following inequality for the limiting power of any size-c~, test qS~, (c~, --~ c~), lira sup{f;(O,)} _< a - q~(zl-~ - z) . (2.29)
Inference instochastic processes
61
Any test qS] for which the equality in (2.29) is attained is an asymptotically efficient test. Obviously, the Neyman Pearson test statistic An(O~, 0o) is asymptotically efficient in the above sense. For a general alternative hypothesis K : 0 ~ 0o, it is desirable to find a test which does not depend on the specific direction h in 0n. One may consider the score statistic 2Q~(0o, 0o) defined by (2.7), i.e., 2Q,(0o, 0o) = S~(0o)F~-1(0o)S~(0o) . We have
T(Oo)rn (0o)S.(0o) = sn
d
(oo)rn(Oo)C;1 (Oo))lAd(do)
x2(k), under H , (2.30)
where An(0o) = Cn 1(0o)Sn(0o). The result in (2.30) follows readily from (2.9) and (2.10). Under LAN, it can be shown that An(0o) ~ d Nk(F(0o)h,r(0o)), Also, under LAN, we have, from (2.9), C,Tl(0o)Fn(0o)C~ 1(0o) d> F(0o), under Po -probability . (2.32) under Po -probability . (2.31)
It follows from (2.31) and (2.32) that, under Pod-probability,

T (Oo)r. 1 (Oo)S.(Oo) ~ sn
z2(k, 2) ,
(2.33)
where x2(k, 2) denotes a non-central chi-square random variable with k degrees of freedom and non-centrality parameter 2 = hTF(0o)h. Consider the test function ~ = 1, 2Qn(0o,0o) > X2 ~(k) 0, 2Q,(0o,0o) < xz_~(k) (2.34)
We can show that ~ in (2.34) is asymptotically efficient in a certain class of tests. Let Tn be any estimator of 0 satisfying (2.12). Consider the test function ~;~ = { 1, 0, (Tn - 0o)TCn(0o)V~(0o)Co(0o)(To - 0o) _> Z~_~(k) (r, - 0o)TC,(0o)V~1(0o)Cn(0o)(T, -- 0o) < X~_~(k) (2.35) We have limEHq~, = limELrq~ = ~. We shall now compare the limiting powers of q~, and q~]. From (2.33) and (2.34), we have, limEo,,q~, = P(z2(k, 2) >_ zz_~(k)) . From (2.12), we have, under Po -probability, Cn(0o)(Tn - 0o) -d-~Nk(h, VT(00)) (2.37) (2.36)
62
I. V. Basawa
Consequently,
(Tn - Oo)ZCn(Oo)V71(Oo)Cn(Oo)(Tn - 0o) d> z2(k ' 2*) ,

where 2* = hTVT 1(0o)h. Therefore, llmEo~b, --- P(z2(k, 2*) _> Z12_~(k)) . From (2.13), we have 2 _> 2*, and hence by (2.36) and (2.39) we have limE0oq~n > limE0,~* .
(2.38)
(2.39)
(2.40)
Any test qS~*is said to be asymptotically efficient if its limiting power attains the upper bound in (2.40). We have thus shown that the score statistic 2Qn(0o , 0o) is asymptotically efficient in the above sense. The score test is not the only test which is efficient according to the criterion based on (2.40). The usual likelihood ratio statistic and the so-called Wald statistic are also asymptotically efficient The likelihood ratio statistic is given by 2An(On, 0o), where we can use the one-step maximum likelihood estimator 0n defined by (2.19). We have, from (2.2), 2An(0n, 0o) = 2Qn(0n, 0o) + Op(1), under P0o-probability . (2.41)
Under LAN, (2.41) remains valid under P0 -probability also. We will show that Qn(0n, 0o) and Q~(0o, 0o) have the same limit distributions under both Poo- and P0 -probabilities. We have Qn(0n, 0o) = (On
- o0)TSn(O0) -- l(On -- o0)TFn(Oo)(On -- 00)
= A[(0o)F-I(0o)An(0o) _ ~ A n T r l -1(00)
(c; l(oo)rn(0o)c21 (0o))r -l(oo)A (oo)

+Op(1), by (2.22)
1 T (Oo)rn -1 (0o)Sn(0o)
Op(1)
= Qn(0o,0o) + op(1),
under Poo ,
(2.42)
by (2.7). Under LAN, the above result also holds under Po-probability via contiguity (see Hall and Mathiason (1990)). Consequently, the likelihood ratio statistic 2An(0n, 0o) has the same limit distributions as that of 2Qn(0o, 0o) under both Poo- and P0 -probabilities, and hence it is asymptotically efficient. Finally, the Wald statistic is defined by (0n - 0o)TFn (0o)(0n -- 0o). It is seen that (0n - 00)TFn(00)(0n - 0o) = A[(0o)F 1(0o)(C;1 (0o)Fn(0o)C~ 1(0o)) F-l(0o)An(0o) + op(1), by (2.22)
- s . (oo)rn
T -1
(Oo)Sn(Oo)+ op(1)
(2.43)
= 2Qn(0o, 0o) + Op(1) .
63
Again, the above result is valid under both Poo and P0:probabilities. The Wald statistic is therefore asymptotically efficient.
2.6. Efficient tests: Composite hypotheses

Let 0 = ( ~ T ~T)T where ~ and p are (p x 1) and ( k - p ) x 1 vectors, respectively. Suppose c~ is the parameter of interest and fi represents a nuisance parameter. Consider the problem of testing a composite hypothesis H : e = e0, when fi is unknown. We shall show that, under LAN, the score test, the Wald test and the likelihood ratio test are all asymptotically efficient in the sense of maximizing limiting power at the local alternatives. Partition h = (hT ~, h~) T where ha and h~ are of the order (p x 1) and (k - p ) x 1 respectively. Our efficiency criterion will be based on the limiting power at the local alternatives Kn : e = c~n, and fl =- fin, where an = ~o + Cn -1 (C~o,fi)h~, and fin = fi + C~,~(~0, fl)h/~. Here Cn(0) is taken as a diagional matrix, Cn# is the (p x p) diagonal matrix containing the first p diagonal elements and Cn: is the ( k - p) x ( k - p) diagonal matrix containing the remaining ( k - p) diagonal elements of Cn(0). Note that fi is unspecified in both e, and fin. Now, partition Sn (0) and Fn (0) as
s.(0) = rn(0) = T t.rn,=,(0) /
\s=:(0)
'
'
where S=,=(0) is of order (p x 1), Fn,==(0) is a (p x p) matrix, etc. In analogy with (2.19) we define a one-step likelihood equation estimator for fl under the restriction H : e = e0, as /}on = rio + F~/~(C~o,/~o)Sn,/~(C~o, rio) , where rio is any preliminary estimator of fl such that C~,/~(C~o,fi)(/~o - fl) = op(1) , and F~/~ denotes the lower right hand (k - p ) x ( k - p ) matrix in F~ 1, i.e.,
= ( r . , e e - rn, ern, Y, e)
T -1 1
(2.44)
It follows, as a special case of (2.21), that under H, C,,/~(C~o,fi)(rio, - fi) ----+N(k_p)(0, F/~/~(C~o,fl))
d
(2.45)
Let 0~,H = geT ~T ~T- The likelihood ratio statistic for testing the composite hy~ 0, e0,n: pothesis H : ~ = ~0, is given by r (1) = 2An(On, On,H) , (2.46)
where 0, = (:~ ~T)T is given by (2.19). The Wald and the score statistics are given \ T n ~ 1"tl ,/ respectively by
64
L V. Basawa
T,(2) = (c2, - ~0)Tc,,~(0,){F~(0~)} and

T(3)
'Cn,~(0,)(c2~ - c~0) ,
(2.47)
T ^ -1 ^ 0~ ^ -1 ^ ^ = S,#(0,,~v)C, (0,,,/)F (0~,,v)Cn,~(0,,ic)S~,~(0,,,q)
(2.48)
The limit distributions of the above three statistics T(i), i = 1,2, 3, can be shown to be identical under both H and K,. We have, for i = 1,2, 3,
T(i )
d { )~2(p), > Z2(p, 2),
under H under K~ '
(2.49)
where 2 = h[(F~,))-lh~, with OH = (c~0,fi). Consider the class of asymptotically similar size-7 tests of H, viz., the class of test functions qS~ such that limEH(~b,) = 7, for all/~ .
Now, let U~ be any estimator of c~ such that, under K~, C~,~(c~0,/)0~) (U, - ~0)
---+Np(h~, Bu(c~0, fi))
(2.50)
where B~ is a positive definite matrix. Note that (2.50) is an adaptation of the regularity requirement in (2.12). We then have, as in (2.13),
Be(e0, fl) _> F~(c~0, fl0)

Consider a test statistic T~ based on U~, defined by
Tn T -1 (e0,/}0~)V~ , = V~Bu
(2.51)
(2.52)
where
It follows from (2.50) that, under K~,
---+ z (k, 2 * ) ,
where 2* = h~B T -1 v (c~0,fi)h~. F r o m (2.51) it follows that 2* _< 2 ,
(2.53)
(2.54)
where 2 is the non-centrality parameter appearing in (2.49). Now, define a test function qS,, 1 if T, > Z~_~(P) 0 if T~<Z2_,(p) (2.55)
q~=
It is easily verified that ~b, is asymptotically similar size-co From (2.54) it follows that the Wald statistic T~ (2) is asymptotically efficient in the class of tests given by (2.55). Since the likelihood ratio and the score statistics can both be expressed as
I n f e r e n c e in s t o c h a s t i c p r o c e s s e s
65
equal to T(2) plus a Op(1) term under both H and K,, it follows that these two tests are also asymptotically efficient in the same class. 2.7. Neyman and Durbin statistics Consider the problem of testing discussed in Section 2.6. In addition to the three statistics presented in the previous section, the following two statistics are also efficient. The Neyman C(a)-statistic is defined by T~ (4) = yTF~(O,,u)Y, , where
Yn
=
(2.56)
Cnlc~(On,H)Sn,~(On,m)
-1
-1
which represents a regression of S~,~ on Sn.~. Since Sn,/~(0~,H) = Op(1), it follows that 2~ (4) is asymptotically equivalent to the score statistic T~ (3), under both H and K,. In order to introduce the Durbin statistic, let/~0~ be the estimator defined in (2.44), and let ~0~ be the estimator defined by ~0~ = 50 + r~(50,/}0,)Sn,~(50,/~0,) , (2.57)
where 50 is any preliminary estimator of ~ such that C,,~ (0n) (50 - ~0) = Op( 1). In other words, fl0, is a one-step solution of the equation S,,p(e0, fi) = 0 for fi, and ~0~ is a one-step solution of the equation S~,~(~,/~0,) = 0 for e. The Durbin statistic for testing H : a = e0 is then defined by
(40 -. (2.55)
It is easily verified that T~ (5) is asymptotically equivalent to T,, (2) under H as well as under K,. Consequently, all the five statistics ir~ (i), i = 1 , . . . , 5, discussed in Sections (2.6) and (2.7) are asymptotically efficient. The choice among these statistics may depend on the simplicity in deriving the statistics in any particular problem. Note that T~ (1) depends on both the restricted and the unrestricted (maximum) likelihood estimators 0~,H and 0,. The Wald statistic T(2) depends only on the unrestricted estimator 0~. The score statistic T~ (3) and the Neyman C(a)-statistic T~ (4) both need the restricted estimator 0~,H. The Durbin statistic requires the two restricted estimators 20, and/~0, obtained in a convenient successive substitution. Even though asymptotically these five statistics are asymptotically equivalent, for small or moderate sample sizes their performance may vary significantly. 2.8. Extension to non-ergodic models For some models in stochastic processes, it turns out that the limiting Fisher information F(0) in (2.9) is a non-degenerate random matrix. The limit distribution of the maximum likelihood estimator given in (2.21) will then be a mixture
66
I.V. Basawa
of normals rather than a normal. Typically, such models belong to the local asymptotic mixed normal (LAMN) family rather than the L A N family. See Basawa and Scott (1983) for the theory and applications of the L A M N family. Stochastic models belonging to this class are also referred to as non-ergodic models. See Basawa (1981 a, b) and Basawa and Brockwell (1984) for conditional inference for non-ergodic models.
3. Optimal estimating functions

In many cases, the density p,(x(n); O) is either not known, or it may be unwieldy. Consider a class of estimating functions defined by
n
gn(O) = ~ W t ( O ) ( X t t-I
m,(O)) ,
(3.1)
where g,(O) and Wt(O) are (p x 1) random vectors such that g,(O) E ~-, and W,(0) E ~-t-1, where ~ m = '(Xm,Xm-1,.", ). If mr(O) = E(Xtl2t_l), {g,(0), Y , } is a zero-mean martingale. Let o-2(0) = Var(Xtl~t_~). Godambe (1985) has shown that an optimum choice of Wt(0) is W(0) =
(dmt(O)~a~-2(O)
\ d0 /t
(3.2) '
where the optimality criterion seeks Wt(0) which maximizes (in the partial order of non-negative definite matrices) the Godambe information matrix I~.(O) = Thus, if
n
(E(~o))(E(g,g~))-I(E(~o))
(3.3)
g~(0)
=~(Xt-mt(O))W~(O),
t=l
we have that (Io,0 ' -Ig) is non-negative definite for all g. of the form (3.1) and satisfying some regularity conditions. The optimal estimating function gO is also referred to as a quasi-score function. See, for instance, Heyde (1997). The quasiscore estimator Oq is obtained as a solution of the equation g ( 0 ) = 0. Under appropriate regularity conditions (see Heyde (1997)) one can show that
W~/2(O)(Oq - O) J~ N(O, I)
where
(3.4)
Wn(0) : ~ (dmt(0)~
,=1\ dO J \
(dmt(O)~wa~_2(0) "
dO J
(3.5)
Inferencein stochasticprocesses
67
If the estimating function g~ (0) is not restricted to be of the "linear" form in (3.1), it is well known that the optimum g,(0) which maximizes Ig,(0) in (3.3) in the unrestricted class of estimating functions is given by the likelihood score function S~(0). See Godambe (1960). The theory and applications of quasi-score estimators, confidence sets and test statistics are discussed in Godambe (1991) and Heyde (1997). See also Basawa (1985, 1991) and Basawa et al. (1985) for tests based on estimating functions.
4. Semiparametric models and adaptive estimation
Consider the model

X~ = mr(O)
+ a,(O)~t
(4.1)
where {ct} is a sequence of independent and identically distributed random errors with E(et) = 0 and Var(et) = a F, mr(O) = E(Xt[~t-1), and a~(0) = Var(Xtl~t_l). Suppose 0 is a (p x 1) vector of parameters. If the density f~(.) of {ct} is known, one can apply the likelihood methods for inference regarding 0. On the otherhand, iffc(.) is unknown, and mr(O) and a2(0) are modeled as known functions of 0, measurable with respect to ~,~t 1, the model in (4.1) is an example of a semi-parametric model. Let 0n be a preliminary estimator of 0. For instance, 0n may be the leastsquares estimator obtained by minimizing
c2(0 ) = (Xt - mr(O)) 2
(4.2)
Denote Yt = et(On). Let f~(y) denote a kernel density estimator, e.g.,

1 n
fn(y) =-nt~=lb nlK(y~)
(4.3)
where K and bn are the kernel and the bandwidth respectively. Let 0n(f~) denote the estimator obtained as a solution of the estimating equation:
n d
t=~l d-Olg
at(O)
,]
(4.4)
Note that if fc is known, the usual likelihood equation is given by d fc ( X t - #t(0)~ : 0 . t~l ~ l g at(0) J (4.5)
If O~(f~) denotes a solution of (4.5) when f~ is known, one can show, under appropriate regularity conditions, that
68
1. V. Basawa
~/~(0,,(f~) - 0n(f~)) = op(1) .
(4.6)
When (4.6) is satisfied, the estimator 0n(fn) is said to be adaptive. See Bickel (1982), Bickel et al. (1993) and Drost et al. (1997) for the theory and applications of adaptive estimation for semiparametric models.
5. Bayes and empirical Bayes methods

Suppose that prior information about the parameter 0 is available and that it can be quantified in the form of a density function ~(0), 0 E Q c R p. The density ~z(0) is referred to as the prior density. Given 0, the observation vector X(n) has density p(x(n) 10). If b,, is an estimator of 0, and if l(6n, O) is a prescribed loss function, the Bayes risk (or average risk) corresponding to 6n is given by R(bn) = E(l(3,, 0)), where the expectation E(.) is with respect to the joint density, p(x(n)lO)~(O ). The Bayes estimator 6 of 0 is such that
R(c5) <R(6n), for all 6~ EA ,
(5.1)
where A is the class of all estimators with finite risk. If the loss function is quadratic, it can be shown that 3o ----E(01X(n)) . Define (5.2)
d21ogpn(X(n)]O)
F,(0) = dO dOT (5.3) If 0n is the maximum likelihood estimator, we have seen in Section 2, that under regularity conditions, F~/Z(0)(0n - 0)d-~Up(0, I) . (5.4)
The ML estimator 0n ignores the prior information ~(0), and uses only the sample information contained in the likelihood function pn(x(n)lO ). It is of interest to compare the Bayes estimator 3 with the M L estimator 0n for large n. Under regularity conditions (see Basawa and Prakasa Rao (1980, Ch. 10)) one can show that
1/2(0)(3 n o - O) a ,Np(0, I) , rn
(5.5)
and hence the Bayes estimator 6 is asymptotically equivalent to the M L estimator 0n. Often, the prior density depends on an unknown parameter, say e. Denote the prior density as ~z(0;e). The Bayes estimator of 0 will then depend on ~, and 0 denote the Bayes estimator as c5 n(e). Since ~ is unknown, we may first estimate from the marginal density
69
p(x(n); =) =
fp(x(n) lo)(o;@dO .
(5.6)
Let fin denote the maximum likelihood estimator of e based on the marginal likelihoodp(X(n); ~). The estimator, cSn (c2n), obtained from the Bayes estimator by replacing e by ~n, is known as an empirical Bayes estimator of 0. It can be shown that 6n(en 0 ^ ) is a good approximation for the Bayes estimator 6,(~), for large n.
6. Some applications
Here we give some examples to illustrate the inference methods discussed in the previous Sections.
Ex. 1. Markov processes

Let {Xt}, t = 1 , 2 , . . . , be a Markov process with a general state space )~ and stationary transition measures
Fo(x,A) = Po(Xn+l E AlYn = x) ,
(6.1)
0 C f2 C R p. Suppose these transition measures admit a unique stationary distribution Fo(.) defined by Fo(a) = [ Fo(x,A)Fo(dx) .
(6.2)
Furthermore, suppose that Fo(x,A) admit transition densities p(y,x; 0), with respect to a measure )~(-) defined by the relation
Fo(x,A) = fAp(y,x; O))o(dy) .

The likelihood function based on X ( n ) = ( X ~ , . . . , X , ) X1 = xl) is given by
n-1
(6.3) (and conditional on
p,(X(n); 0) = H p ( X t , X t + l ; O )
t=l
(6.4)
Under regularity conditions (see Billingsley (1961)) it can be shown that the model belongs to the LAN family. See also Roussas (1972). As a specific example, consider a finite state Markov chain with state space )~ = { 1 , 2 , . . . , m}, and the transition densities
Pij
=P(Xt+l = jIXt
i),
i,j C )~ .
(6.5)
Let nij denote the number of transitions i -+ j in the sample X(n). The likelihood function is given by
70
I . V . Basawa
pn(X(n); 0) = H p ~ v .
~,j
(6.6)
Here the parameter space for 0 = {pij, i,j E Z, s.t. ~ j p i j = 1}. The maximum likelihood estimator of Pij is seen to be
1)ij =--,nij
hi.
i,j E )~ ,
(6.7)
where ni. = ~ j n i j . It can be shown (see Billinglsey (1961)) that x / ~ i j - p ~ j ) , i,j E Z, are jointly asymptotically normal with mean zero and asymptotic variances and covariances given by
1
o'i4,i,,j, = re7 jail, ( g)jj,pij - pijpi,j, )] ,
(6.8)
where 6u~ denotes the indicator function which takes the value 1 if u = v and zero if u v. The asymptotic optimality property of {J3ij} is assured by the LAN property. See Basawa and Prakasa Rao (1980a, Ch. 4) for various problems of inference regarding finite Markov chains.
Ex. 2. Branching processes

Let X0 = 1, X1, X2... be the generation sizes of a Galton-Watson branching process with offspring distribution
P(X1 = j ) = p j ,
j=O, 1,2,....
(6.9)
Let {Ykl}, k = O, 1,..., l = 0, 1,..., denote t h e / t h offspring belonging to the kth generation. We then have x,,
Xn+l = Z
/=1
Ynl
(6.10)
Note that {Xt} is a Markov process with state space Z = {0, 1,2,...} and the transition probabilities
pij = P(Xn+I = j[Xn = i)
=P(I__~Y~I
j)
(6.11)
The random variables {Ykl} are assumed to be independent and identically distributed, each distributed as )(1, with E(X1) = # and Var(X~) = o-2. More specifically, suppose the offspring distribution belongs to the power series family, viz.,
P(Xl=x)=axO~/A(O),
where A(O) = ~x~=0 ax0~. We have
x=O, 1,2,....
71
#=O~o)/A(O),
and
~r 2=0~d# .
The likelihood function based on 0(1,... ,X,) is given by p,,(X(n); We have S,(/~) =
O)ctO~ X' (A( O)) ~ x~ z ~logp~(X(n);O)o#

-
(6.12)
= (~logp,(X(n);O).)
-1
(6.13)
The equation Sn(/~) = 0 gives the ML estimator of #:

n n
/~ = Z X t / Z X t _ I
1 1
(6.14)
The Fisher information is seen to be / - ~ 2 ~lgPn\ 5 -2 . /zn-1 ) ( . (6.15)
Assume throughout that P(Y1 = 0) = 0, and # > 1 to avoid extinction of the process. It can be shown that
~2 logp~ / . ,
,'~ /L,~#)) ~ W ,
as
n--+ oc ,
(6.16)
where W > 0 is a non-degenerate random variable. This process, therefore, belongs to the L A M N family. See Basawa and Scott (1983) for problems of inference regarding/~. See also Heyde (1975). Let
n
Jn(#) = --2(#) ~ X t - 1
1
(6.17)
One can show that

J ; 1/2
d-~N(0, 1) .
(6.18)
The result in (6.18) is equivalent to

d , I~/2(#)(fi, - #) --+U (0, W 1) .
(6.19)
Note that the limit distribution N* in (6.19) is a mixture of normals rather than a normal.
72
L V.
Basawa
Guttorp (1991) gives an extensive review of inference problems for branching processes.
Ex. 3. Time series

Consider the model
Xt = mr(O) + ~t(O)et ,
(6.20)
2 Also, mr(O) where {et} are i.i.d, random errors with mean zero and variance o-. and at(0) are specified J~t_l-measurable functions. The model in (6.20) includes linear time series models such as ARMA, and nonlinear time series such as the threshold autoregressive (TAR) processes. In addition, (6.20) includes the conditionally heteroscedastic autoregressive (ARCH) models. Drost et al. (1997) have established the LAN property for the general class of models in (6.20) when the density of the errors f~(-) is specified. The optimality properties of the M L estimator of 0 and of related test statistics follow immediately. Moreover, Drost et al. (1997) have studied adaptive estimation of 0 when f,(.) is unknown.
Ex. 4. Conditional exponential Markov processes

Let {Xt} be a stationary ergodic Markov process with transition densities of the form
p(xt, Xt+l ; O) = h (xt, Xt+l ) exp [oTz(xt, xt+I ) -- g(O, xt)] .,
(6.21)
where 0 is a (p x 1) parameter, Z(.) is a specifiedp x 1 vector of statistics, and g(.) is a given real valued function. The likelihood equation based on X(n) = (X1,... ,Xn) is given by
~=l (dg(O,Xt) ) ~ "

- -
--
n- Z(Xt,z~t+l) = 0 .
t=i
(6.22)
Under regularity conditions, Hwang and Basawa (1994) have established the LAN property for the above model. If 0n is a consistent solution of the equation (6.22), it can be shown that
v/~(O, - O) a Np(O, F -1 (0)) ,

where
(6.23)
r(O)=E{'d2g(O,X,))
\ d0d0T J '
(6.24)
the expectation being taken with respect to the stationary distribution. Hwang and Basawa (1994) have also discussed applications of the above model to several nonlinear time series examples.
73
Ex. 5. Random coefficient autoregressive processes

Suppose {Xt} is a sequence of random variables defined by
X t = Ho(Xt_l,Zt) q- c t ,
(6.25)
where {Zt} and {et} are independent sequences of i.i.d, random variables (unobserved), and Ho(.) is a specified function. It then follows that {Xt} is a Markov process with transition densities
p(xt, xt+l ; O) = / f~(xt+l - Ho (xt, zt))gz(z)dz ,
(6.26)
where f~(.) and 92(') denote the densities corresponding to et and Zt respectively. The above model includes the following special cases: (i) Random coefficient AR(1): Ho(x,y) = (0 + y)x. (ii) Threshold AR(1): Ho(x,y) = 01x + + 02x-, where x + = max{0,x}, x- = min{0,x}. (iii) Exponential AR(I): Ho(x,y) = [01 + 02 exp(-O3x2)]x. (iv) Random coefficient exponential AR(1):
HO(x,Y) :
and
[(01 -]-Yl) -}- (02 -l-Y2) exp(--x2)]x ,
with y = (Yl ,Y2)r. (v) Random coefficient threshold AR(1):
Ho(x,y) = (0~ +ya)x + + (02 +y2)x-,
with y = (yl,y2) T .
Hwang and Basawa (1993) have established the LAN property for the general class of random coefficient models defined by (6.25) and studied problems of inference regarding 0.
Ex. 6. The pure birth process

Let {X~}, t > 0, be a pure birth process with birth rate 0, and X0 = 1, where Xt denotes the population size at t. This is a continuous time Markov process. The intervals between births, Tk = tk - tk-a, k = 1,2,... are independent exponential random variables with E(Tk) = (kO) -1, where tk denotes the epoch of the kth birth. Suppose we observe the process continuously over the interval (0, T). Let B(T) denote the total number of births occurring in the interval (0, T). Note that Xr = B(T) + 1. The likelihood function is given by
pr(x(O,T);O) = (~]fT kOexp(-kOTk)) e x p ( - ( T - tB(r))OXr)
(6.27)
74
I. V. Basawa
We then have d logpr _ B(T) dO 0

_ B(r)
B(T)
~
k=l
krk + ( r - tB(T1)XT ,
0 and
d 2 logp~d0 2
.f0Txt dt ,
B(T)
02
The maximum likelihood estimator of 0 is given by
Or = B ( T ) /
/0"
Xt dt .
(6.28)
It can be shown that, as T -+ oc,
B1/z(T)(Or- O) d N(0, O2) .

Here we have
(6.29)
B(r)/E(B(r)) ~ W,
where W > 0 is a non-degenerate random variable. See Keiding (1974) for details. Consequently, this example belongs to the L A M N family.
Ex. 7. Optimal estimating functions for longitudinal data

Let Xit denote the observation on the ith individual at time t, i = 1 , . . . , m and t = 1,... ,ni. Denote Xi = (X~I,... ,Xini)T, the vector of observations on the ith individual. Assume that X~ are independent with E(Xi) = #i(P), and Cov(Xi) = V//(]],5) , (6.30)
where p is the parameter vector of interest and at is a vector of nuisance parameters. When ~ is known, Godambe's optimal estimating function for fl is given by
+ ~ P i ~ - l ( y i __ }/i) g=~e/~ "
(6.31)
If/~ is a consistent solution of the equation g = 0, one can show, under regularity conditions, that
H2/2@ - fl) --+N(0, I),

where n = ~i=1 hi, and
m
as n -+ oo ,
(6.32)
Inferencein stochasticprocesses Hn = ~ ( d ~ i ~
(d~i~ T
75
i=1 \ d / ~ J Vi-1 \dfiJ
(6.33)
See Fahrmeir and Kaufmann (1985) for the application of the above approach to the generalized linear model. The nuisance parameter e can usually be estimated via ad hoc methods such as the method of moments or the least squares. See Liang and Zeger (1986), Prentice (1988), Liang et al. (1992) and Zhao and Prentice (1990) for various inference problems concerning longitudinal data.
Ex. 8. Bayes and empirical Bayes estimation for autoregressive processes

Let Xt(j) denote the observation on the j t h individual at time t, t = 1 , . . . , T and j = 1 , . . . , n. Consider the model Xt(j)
=
ff)jXt-l (j) + et(j) ,
(6.34)
where {qSj} are assumed to be independent N(a~p,o~), I~ is a (p x l) vector of parameters and aj are (p x 1) vectors of known covariates. It is assumed that {ct(j)} are independent N(0, o-2) random errors, which are independent of {~bj}. The conditional distribution (posterior distribution) of ~n given X(n) = (Xl(n),... ,XT(n)) T is seen to be S((3, a~a20/a), where
3= -1 ( a2 ~a ) a , ,T/ ~ + (1 - a~c-1)q~,, ,
(6.35)
T C = O-e t=l
X2
and
= t(n)Yt-1 (n)
E -I X, l(n/
It is assumed that X0(j) = 0 for each j. Ifa~, a~ and fi are known, ~ in (6.35) is the Bayes estimator of q~n with respect to the quadratic loss function. The estimator c5 2 r 2 and fl are unknow n is based on the T observations on the nth individual. If a~, they may be estimated from the marginal likelihood based on all the nT observations, {Xt(j)}, j = 1 , . . . , n and t = 1 , . . . , T . Let ~ denote ~ after a~, a~ and fi are replaced by their estimates. Then 6 is an empirical Bayes estimator of ~bn. See Kim and Basawa (1992) for the properties of 6.
Acknowledgement
We thank the referee for a careful reading and many useful suggestions.
76
L V. Basawa
References
Basawa, I. V. (1981 a). Efficient conditional tests for mixture experiments with applications to the birth and branching processes. Biometrika 68, 153 165. Basawa, I. V. (1981b). Efficiency of conditional maximum likelihood estimators and confidence limits for mixtures of exponential families. Biometrika 68, 515-523. Basawa, I. V. (1983). Recent trends in asymptotic optimal inference for dependent observations. Australian J. Statist. 25, 182-190. Basawa, I. V. (1985). Neyman-LeCam tests based on estimating functions. Proc. Berkeley Conference in Honor of Neyman and Kiefer (Eds., L. LeCam and R. Olshen), Wadsworth, Belmont, pp. 811-825. Basawa, I. V. (1990). Large sample statistics for stochastic processes: Some recent developments. Proe. of R.C. Bose Symposium: Probability, Statistics and Design of Experiments (Ed., R. R. Bahadur) pp. 102122. Wiley Eastern, New Delhi. Basawa, I. V. (1991). Generalized score tests for composite hypotheses. In Estimating Functions (Ed., V. P. Godambe) pp. i21-132. Oxford Univ. Press. Basawa, I. V. and P. J. Brockwell (1984). Asymptotic conditional inference for regular nonergodic models with application to autoregressive processes. Ann. Statist. 12, 161-171. Basawa, I. V. and H. L. Koul (1988). Large sample statistics via quadratic approximation. Inter. Statist. Reviews 56, 199-219. Basawa, I. V. and B. L. S. Prakasa Rao (1980a). Statistical Inference for Stochastic Processes. Academic Press, London. Basawa, I. V. and B. L. S. Prakasa Rao (1980b). Asymptotic inference for stochastic processes. Stoch. Proc. and Appl. 9, 291-305. Basawa, I. V. and D. J. Scott (1983). Asymptotic Optimal Inference for Nonergodic Models. Lecture Notes in Statistics, Vol 17. Springer-Verlag, New York. Basawa, I. V., R. Huggins and R. G. Staudte (1985). Robust tests for time series with an application to first order autoregressive processes. Biometrika 72, 559-571. Bickel, P. J. (1982). On adaptive estimation. Ann. Statist. 10, 647~571. Bickel, P. J., C. A. J. Klaassen, Y. Ritov and J. A. Wellner (1993). Efficient and Adaptive Estirnationfor Semiparametric Models. Johns Hopkins Univ. Press, Baltimore. Billingsley, P. (1961). Statistical Inference for Markov Processes. Univ. Chicago Press, Chicago. Drost, F. C., C. A. J. Klaassen and B. J. M. Werker (1997). Adaptive estimation in time series models. Ann. Statist. 25, 786-817. Fahrmeir, L. and H. Kaufmann (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models. Ann. Statist. 13, 342-368. Godambe, V. P. (1960). An optimum property of regular maximum likelihood estimation. Ann. Math. Statist. 31, 1208 1212. Godambe, V. P. (1985). The foundations of finite sample estimation in stochastic processes. Biometrika 72, 419428. Godambe, V. P. (1991). (Ed.) Estimating Functions. Oxford Univ. Press, Oxford. Guttorp, P. (1991). Statistical Inference for Branching Processes. Wiley, New York. Hall, W. J. and D. J. Mathiason (1990). On large sample estimation and testing in parametric models. Inter. Statisti. Reviews 58, 7297. Heyde, C. C. (1975). Remarks on efficiency in estimation for branching processes. Biometrika 62, 49-55. Heyde, C. C. (1997). Quasilikelihood and Its Applications. Springer, New York. Hwang, S. Y. and I. V. Basawa (1993). Asymptotic optimal inference for a class of nonlinear time series. Stoch. Proe. and Appl. 46, 91-113. Hwang, S. Y. and I. V. Basawa (1994). Large sample inference for conditional exponential families with applications to nonlinear time series. J. Statist. Plann. Inf. 38, 141-158. Kim, Y. W. and I. V. Basawa (1992). Empirical Bayes estimation for first order autoregressive processes. Austral. J. Statist. 34, 105 114.
77
Keiding, N. (1974). Estimation in the birth process. Biometrika 61, 71 80. LeCam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer-Verlag, New York. LeCam, L. and G. Yang (1990). Asymptotics in Statistics: Some Basic Concepts. Springer-Verlag, New York. Liang, K. Y. and S. L. Zeger (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13-22. Liang, K. Y., S. L. Zeger and B. Qaqish (1992). Multivariate regression analysis for categorical data (with discussion). J. Roy. Statist. Soc. Ser B 54, 3-40. Prentice, R. L. (1988). Correlated binary regression with covariates specific to each binary observation. Biometrics 44, 1033-1048. Roussas, G. G. (1972). Contiguity of Probability Measures. Cambridge Univ. Press, Cambridge. Zhao, L. P. and R. L. Prentice (1990). Correlated binary regression using a quadratic exponential model. Biometrika 77, 642-648.
D. N. Shanbhag and C. R. Rao, eds., Handbookof Statistics, Vol. 19 2001 ElsevierScienceB.V. All rights reserved.
A
l
Topics in Poisson Approximation
A. D. Barbour
1. Introduction One of the many problems addressed by de Moivre in his fundamental treatise of (1712) was that of finding the value of m which makes the binomial probability Bi(n,p){[0, m]} closest to 1/2, for given n and p. In considering small values of p, he gives the approximation
m
Bi(n,p){[0, m]} ~ Ze-nP(np)k/k! k=0
(1.1)
(Hald, 1988, pp. 214-5). However, a formal statement of the Poisson approximation to the binomial distribution first appeared over a century later in Poisson (1837). Even then, the relevance of the Poisson 'law of small numbers' in statistical analyses, in stark contrast to that of the normal law, apparently remained completely neglected, until yon Bortkewitsch (1898) demonstrated with a variety of sets of observational data that many random counts of rare and unrelated occurrences are well modelled as having Poisson distributions. This fact is all the more surprising, because the Poisson family has only one parameter to be fitted, as opposed to the two parameters of the normal distribution, so that good fits to observational data are strongly indicative of some real underlying mechanism having produced them, rather than of the flexibility of the probability model. We shall see that the mechanism is very simple, and can indeed be summarized by the adjectives 'rare' and 'unrelated'. The natural proof of (1.1) is to observe that
Bi(n,p){k} = ( ~ ) p k ( 1 - p ) n-k
1-
end+log(i-p))(1_p)-k
(1.2)
If n is large and p = p(n) ~ 2In for fixed 2, then 79
80
A.D. Barbour
n{p + log(l - p)} ~ -22/(2n) ---+0 , and, for any fixed k,

k k-1 /
(l-p)hence
n---+oo
j=l--\
[[(1- nJ
j~
-+ 1 ;
lim Bi(n,p){k} = e-;~2k/k! ,
(1.3)
and (1.1) follows. However, for any particular finite n and p, limit asymptotics alone give no idea of the accuracy of the approximation in (1.1); for this, a finer analysis of (1.2) is needed. With a little more care, it follows from (1.2) that
Bi(n,p){k} = {e-nP(np)k/k!}{1 + O(np 2, k2n-1)} ,
(1.4)
and this in turn can be used to show that there exists a constant c > 0 such that
dTv(Bi(n,p), Po(np)) _< cp(1 + n p ) ,
(1.5)
for all choices of n and p, where, for probability distributions P and Q on Z+,
dTv(P, Q ) : = sup IP(A) - Q(A)[ .

ACZ+
(1.6)
Hence de Moivre's approximation (1.1) is accurate to within an error of cp(1 + np), for a constant c which can be explicitly computed. The estimate given in (1.5) suffices to show that Bi(n,p) and Po(np) are close, provided that b o t h p and np 2 are small. I f p = p ( n ) ~ 2In for fixed )~ as n ~ ~ , these two quantities are both of order n -1, and the Poisson approximation becomes ever more exact as n ~ ~ ; the bound (1.5) shows that the same is true, provided only that p(n) = o(n-~/2). However, the true accuracy of the approximation is actually better still. Prohorov (1953) proved that there is a universal constant c such that
d~v(Bi(n,p), Po(np)) _~ cp ,
(1.7)
for all n and p, from which it follows that p(n) ~ 0 is enough for asymptotic accuracy; he also showed that this order of accuracy is best possible, if np is bounded away from 0. If Poisson approximation were only appropriate for the binomial distribution, it would be of limited importance, but von Bortkewitsch's data suggest otherwise; the random mechanisms underlying his data are unlikely to correspond exactly to sums of n independent Bernoulli Be(p) random variables, all with the same p. In fact, it turns out the 'law of small numbers' is universally appropriate for sums of weakly dependent Bernoulli random variables with small, but possibly differing, values o f p .
Topics in Poisson approximation
81
The first step in checking such a claim is to retain independence, but to let the n ps vary. So take W = ~i=lIi, w i t h / 1 , . . . ,In independent and I / ~ Be(p~); how well is the distribution of W approximated by a Poisson distribution with the n same mean 2 = ~i=1Pi? A calculation in similar vein to that given in (1.2) can be undertaken, in order to compute P[W = k], though now the combinatorics becomes rather more complicated, because each possible k-subset of indices at which the k values of 1 can occur yields a different probability in the sum. Nonetheless, with rather more effort, a formula P[W = k] = {e-X2k/k!}exp(O(2max pi, k2,~-1
k \ l<i<n
max
l<i<n
Pi]~
.: )
(1.8)
can be derived, the order term being uniform in maxl_<~<npi < x , for any 0 < x < 1. In the case of equal pis, 2 = np and p = maxl_<i<npi, and expression (1.8) becomes essentially equivalent to (1.4); and, mirroring the situation with equal p~s, a bound
drv(,e(W), Po(2)) _< c max pi(1 + 2)

1<i<n
(1.9)
for the total variation distance between the distribution 5~(W) of W and Po(2) can be deduced from it. Once again, as when the pis were equal, the factor 2 in (1.9) turns out to be unnecessary when 2 > 1, Poisson approximation to 5(W) being accurate whenever maxa<i_<np~is small. The first result in this direction was that of Hodges and Le Cam (1960), who gave the bound
~1/3
(1.10)
max P[W <_j ] - ~-~e-;@/k!k=0 < 3 maxl<_i<, Pi)
for the difference of the two distribution functions. This was immediately improved upon in Le Cam (1960), who showed that
drv(~(W), Po(2)) <_ 4.5 max Pi

l<i<n
(1.11)
and
1/
dw(~q(W),Po(2)) <_ 82 -1 ~ p / 2 ,
(1.12)
i=1
the latter under the restriction that p = maxl<i<npi _< 1/4; then Kerstan (1974) improved upon (1.12) by reducing the factor 8 to 1.05. Both (1.11) and (1.12) give bounds of the form ep if all the/)is are equal, and thus provide analogues of Prohorov's bound (1,7) in the more general setting of unequally distributed summands. The methods used by Kerstan and Le Cam, although quite different, both rely on the multiplicative properties of convolution. Kerstan compares the probability generating function of the sum of indicators with the Poisson probability
82
A. D. Barbour
generating function in an ingenious way, and then uses Cauchy's formula as a basic tool in obtaining the estimate of total variation distance from the difference of the probability generating functions. Le Cam also uses complex variables, in his case Fourier methods, for the detailed estimates, but his basic approach is to represent distributions over the non-negative integers ~+ as operators on the space of bounded measurable functions over 7/+, with multiplication of operators corresponding to convolution. This method has since been developed by Deheuvels and Pfeifer, who have used it to obtain the sharpest results yet for the Poisson approximation of sums of independent indicators: in particular, in Deheuvels and Pfeifer (1988), the complex variable techniques of Uspensky (1931) and Shorgin (1977) are combined with the operator method to yield asymptotic expansions of arbitrary order for set probabilities, together with explicit bounds on the error involved. The next step is to relax the assumption of independence between the summands. Here, the above methods all immediately run into problems, because, without independence, the multiplicative structure associated with convolution can no longer be simply invoked - except in some combinatorial settings, where generating functions still arise naturally, as for instance in Hwang (1999). However, this difficulty has already been faced in connection with the central limit theorem for dependent random variables, and other techniques have been devised. The oldest of these is probably the method of moments. In the Poisson context, it is most conveniently expressed in the following terms. A sequence W~ of nonnegative integer valued random variables such that ,limooE{W~,(W~-l)...(W~-k+l)}=2k , k=l,2,... , (1.13)
for some fixed 2, 0 < 2 < ~ , converges in distribution to the Poisson distribution Po(2): W ~ P o ( 2 ) as n--+oc (c.f. Bollobas (1985, Theorem 1.20)). If W~ = ~i=1 i, where t // are indicator random variables, then (kl)-~W~(W,,- 1)... (W~- + 1) counts the number of k-subsets {il,... ,ik} of {1,2,..., n} for which Ii~ . . . . . Ii~ = 1, so that (1.13) can be checked whenever it is possible to compute the expectations E( I]~=1~Iij) for each k. This proves to be useful in many combinatorial contexts, including the study of random graphs. A probability approximation associated with (1.13) follows from the Bonferroni (1936) inequalities: in the representation P[W,,,_>k]=~'(-1 k+~ r - - 1 [E{W,,(W,,-1)...(W~-r+I)}/F! 1 (1.14) the sequence of partial sums successively bracket the total sum, and Po(R)[k, oc) = ~-~(-1) k+r
r=k
1 ,~r/r!
1 "
83
However, an approach through (1.13) and (1.14) has two practical drawbacks. The first is that, with more complicated dependence structures, the computation of ~( H)=l Iij) can become rapidly more difficult with increasing k, rendering the method impracticable. The second is that (1.13) presupposes that 2 < oc remains fixed, whereas, frequently, an asymptotic equivalence as n --+ ec of the form E{Wn(W~-I)...(W~-k+I)}~2~, k=1,2,... , (1.15)
may hold for each k, but with l i m n ~ 2n = ec. Unfortunately, (1.15) alone is not enough to prove that (141,) and Po(2,) become close as n --+ oe if 2n ~ ec, principally because (1.14) then has many large terms of both signs, the probability emerging as the total sum only after many cancellations, and equivalence statements such as (1.15) are not precise enough for a passage to the limit. In practical terms, the approximation in (1.15) would need to be proved to an extremely high relative accuracy, and this greatly detracts from the usefulness of the moment method. A particularly successful approach in the central limit theorem has been to identify a natural flow of time, and then to make use of martingales with respect to the associated filtration, since many of the salient properties of sequences of independent random variables are found to carry over to martingale difference sequences. For sums of indicators, this approach was taken up by Freedman (1974) and Serfting (1975). They suppose that there is a filtration (~-i, 0 < i < n) such that, for each i, I/is ~,~i-measurable, and their bounds are then expressed in terms of properties of the distributions of the random variables pg = E(/~[Yi_,) . For instance, Serfling (1975, Theorem 1) gives the bound d r v ( ~ ( W ) , Po(2)) _< ~{[E(pi)] 2 +
i--1
(1.16)
Elpi- ~pil}
(1.17)
for ,~ = ~i"-1 Epi. However, if Serfling's bound (1.17) is evaluated in the binomial context, with independent/is and with pi = P, fixed and non-random, it yields the value np 2, of the weaker order found in (1.5) and (1.9), rather than of the improved order p given in (1.7) or (1.12). For large 2, this represents a substantial weakness. Indeed, in the binomial context, one way of achieving a bound of order n p 2 for drv(Bi(n,p), Po(np)) is independently to couple each of the summands Ii ~ Be(p) with Zi ~ Po(p) - when Zi = 0, take Ii = 0 with probability e P ( 1 - p ) <_ 1; take Ii = 1 otherwise - in such a way that P[Zi I~] = p ( 1 - e-P). Then, by independence and the triangle inequality,
n
drv(Bi(n,p), Po(np)) < ~

i--1
d~v(Y(Ii), ~ ( Z i ) ) = np(1 - e -p) <_ np 2 ; (1.18)
84
A. D. Barbour
this coupling idea is in fact the essence of Serfling's proof of (1.17). Bounds like (1.7) and (1.12), by bringing in an extra factor of 1/(np), reflect a much more subtle matching of the distributions of the sums than can be achieved by matching each of the summands individually; roughly speaking, the same contrast as that between Skorohod imbedding and the Komlds, Major and Tusnfidy approach to strong approximation in the functional central limit theorem. In order to reproduce bounds such as in (1.7) and (1.12) for sums of dependent indicators, a new idea is needed.
2. The Stein-Chen method An entirely new method for approximating the distributions of random elements was introduced by Stein (1970). His original application was to the central limit theorem for sums of dependent random variables, and the version appropriate for Poisson approximation of sums of dependent indicator random variables was first developed by Chen (1975a): see also Stein (1986, 1992). The method hinges on the following observations. First, take A to be any subset of 77+, and take 2 > 0. Then it is possible to express the indicator function 1A in the form la(./') = Po(2){A} + 2g,~(J'+ 1) -jg~,a(j),
j >_ 0 ,
(2.1)
the Stein Equation, where the function g,~,A:7/+---+ ~ can be recursively determined on 7/+\{0}: the value of g)~ (0) is in any case irrelevant. Furthermore, if M0 and M1 are defined by
Mo(g) : =
then
sup
j_>l
Ig(J)l;
Ms(g) := sup Ig(J+ 1) - g(J)l ,

j>l
(2.2)
Ja := sup Mo(gLA) <_min{l,2-1/2};

AcZ+
(2.3)
J2
:=
sup MI(gLA) _< 2-~(1 - e -;~) _< min{1,2 -1}

ACZ+
(Barbour and Eagleson, 1983). Hence, for W a random variable on Z+, it follows that P[W E A] = Po(A){A} + F{e;o(g;,~; W)} , and thus dw(5('(W),Po(2)) where s~(g; w) : 2g(w + 1) - wg(w) . (2.5) < sup I[E{ez(g)~,A; W)} I ,
ACT/+
(2.4)
85
The second observation is that, if W = F~i==lli is a sum of weakly dependent n indicators and 2 = F W = ~i=1Pi, then ]~{ex(g; W)}I can frequently be shown to be small by relatively simple arguments. For example, take t h e / i once again to be independent, with Ii ~ Be(pi) and 2 = ~i==lPi as above. Then, writing Wi = ~i,iI, ., = W - I i , it follows for any bounded g that ~_{Iig(W)} = F{//g(Wi + 1)} =pi[lZg(Wii + 1) , (2.6)
this last because li and W~ are independent. Hence, from (2.5), and using the definitions of 2 and W, we find that IZ{e;o(g; W)} =
n
Pi ~-9(W + 1) - Z p i ~ g ( W i
i=1
+ 1) (2.7)
= Z p i ~ _ { g ( W i + 1 +Ii) - g(Wi + 1)} .

/=1
Now the quantity in braces is zero if Ii = 0, and is in modulus at most M1 (9) if Ii = 1. Hence we have shown that
n
]IZ{e~(9; W)}I <- Z P Z M I ( 9 )

i=1
(2.8)
Thus, taking g = 9x,A for any A C 7/+, and using (2.8) with (2.3) in (2.4), it follows that
n n
drv(L-q~(W),Po(2)) <_J2~-~p 2 <_ min{1,2-1} Z p 2 _< maxpi .

i=l i=1
(2.9)
l<i<n
These bounds are of the same orders as those (1.11), (1.12) of Le Cam, though with better constants even than Kerstan's improvement, and without restriction on the value of maxl_<i_<=pi; most importantly, the factor 2 -1 is present when 2 > 1. Lower bounds of exactly the same order are proved in Barbour and Hall (1984), showing that no essential improvement is possible. 2.1. The local approach Turning to dependent indicator random variables Ii, the main requirement is a substitute for the last step in (2.6), the only place where independence was invoked. Clearly, Ii and }-~i,#iIi, cannot normally be expected to be independent, but it is often reasonable to suppose that Ii and W/= ~i'Xi are almost independent, if, for each i, the subset Ni of indices is carefully chosen. For instance, if the Ii are an m-dependent sequence, then the choice Ni = {i' : ]iI - i I < m} is such that W/ and I~ are independent; if the i~ form a mixing sequence, the dependence between Wi and Ii is controlled by a mixing coefficient e(m). In either case, one finds that I~{/ig(W~ + 1)} - p i ~ g ( W i + 1)1 _< ZiMo(g) , (2.10)
86
A.D. Barbour
uniformly for all bounded 9, with Zi either zero or suitably small; for instance, one can take Zi-- r l r ( Z , l ~ ) - p * l , (2.11)
zero if/,. and W/are independent. It then remains only to reformulate the first equality in (2.6), which, because of the new definition of Wi, is now no longer valid. So suppose that, for each i, W can be written in the form
W = Wii + Zi 4- I i ,
(2.12)
where Z,. and W/are any non-negative integer valued random variables. Then
E{Iig(W)} -- E{Iig(W~ -+-Z, -F 1)} = E{I;g(Wi + 1)} + E{Ii(g(W~ + Z,- + 1) - 9(W~ + 1))} ,
(2.13) whereas
piEg(W + 1) = piEg(Wii + Zi + Ii + 1) = piEg(Wi + 1) + piE{g(Wi + Zi + Ii + 1) - g(W/+ 1)} .

(2.14) Now the difference between (2.14) and (2.13) can be bounded by using (2.10), together with the direct estimates
IE{I,(g(<
+ Zi -t- 1) - o(Wi + 1))}1 _<
E(I,Z~)M~(g)
and
IE{g(W~ +Z~ +Z, +
1) -
g(W~ -- 1)}l S (P~ + EZ~)M~(g) .
Adding over i and substituting into (2.5) thus yields
IE{~x(g; W)}l _< Z() iMo(g) -}- {[F-(IiZi) +pi~(Zi

i-I
+Ii)}Ml(g)) ,
(2.15)
where Zi is such that (2.10) holds for all 9, as with the choice (2.1 1). Now, again arguing as for (2.9), take 9 = 9~,A for any A c 77+, and use (2.3) to bound the right hand side of (2.1 5); then substitute the result into (2.4). This leads to the
Poisson Local Estimate (Chen, 1975a)

drv(LZ(W),
Po(2))
<_J1
i=1
Xi + J2 Z { t 72 -F pi~-Zi + l:(IiZi)} ,
i=I
(2.16)
true for any collection of representations of W as in (2.12), with & _< min(1,2 a/2) and J2 _< min(1,2-1).
Topics in Poisson a p p r o x i m a t i o n
87
If Z / = ~i'cNi Ii,, where Ni consists of the 'near' neighbours of i, the first term in the bound in (2.16) can be thought of as taking care of dependence at long range, since Z; measures the effect of the configuration far from i upon the probability of having Ii = 1. The second term then relates to the effects of local dependence. The quantity
/7
,~-1 Z p i ( P i
__ [~Zi )
(2.17)
i=l
reflects the expected number of 1s, including that at i if there is one, in a typical neighbourhood Ni, and represents a general penalty for choosing the Ni large in order to make the long range dependence weaker. Similarly, the quantity
n n
2 1Z~_(iiZi ) = 2-1Zpi~_(Zili. = 1)
i-1 i-1
(2.18)
reflects the expected number of ls, excluding that at i, in a typical neighbourhood Ni for which Ii = 1; it contrasts with (2.17) in that it measures the tendency for there to be local clumps of ls, and can be large even when the neighbourhoods Ni are small as measured by (2.17). Since J2 _< 2 -1, the second term in the bound in (2.16) is no larger than the sum of (2.17) and (2.18). Note that local clumping can easily make Poisson approximation poor: for instance, if I2j = Izj+l = Ij for each n j, where the Ij ~ Be(p) are independent, then W = ~i=11i is not close to Poisson for large n when p ~ 2In, and this is reflected in (2.18) for choices of Ni which include the nearest neighbours of i. Note that the bound (2.16) does not involve the higher moments of W. It is enough to be able to control EZi, ~_(liZi) and the differences ]E(/i[ W~) Pi]. Note also that any choices of ~ and Zi satisfying (2.12) are in order, and that they do not have to be defined as sums of the original/i over fixed neighbourhoods Ni; the aim is merely to find choices Wi and Zi which make the right hand side of (2.16) small.
-
EXAMPLZ 2.1. As an example of the application of the Poisson Local Estimate (2.16), suppose that Y1,..., Yn+m-1 are independent random variables with common distribution # over a finite alphabet d . Let c~= ala2". "am be a specific sequence of m letters of ~ without self overlaps: that is, such that, for all
r= l,2,...,m-1,
ala2 " " " ar ~ am-r+lam-r+2 " " " am
Define L = I[~ = a l , . . . , Y/+m-1

n
aml, 1 < i < n, and set

m
W = ~Ii;
i=l
= ~/i = 1 - I 1-1
#(at);
)t = n p .
Thus W counts the number of times that the sequence ~ appears in the long random string Y1,..., Yn+m 1. The/is are dependent, but only locally, in the sense that li is independent of (Ij,jf~Ni), where N~ = {j: [i - J l < m). So define
88
A.D. Barbour
jEN,\ {i}
j~N,
=
satisfying (2.12), and observe that F(fiImi) p, by the independence of I i and W/, allowing the choice Zi = 0 for all i. Furthermore, EZi <_ 2(m - 1)p and, because self overlaps do not occur in ~, IiZi = 0 a.s. for each i. Substituting these values into the Poisson Local Estimate (2.16) yields
drv(~(W),Po(2))
<<min{1,2 1}np2(Zm - 1) _< (2m - 1)p .
(2.19)
Thus, if p+ = maxaed #(a), then p _< #~, and it follows that drv(LP(W),Po(2)) _< ( 2 m - 1)#~ , (2.20)
small whenever m is large, except in the trivial case where # makes one of the letters in ~4 certain to occur every time. This very elementary application of the Stein-Chen method has been widely generalized (Arratia, Gordon and Waterman, 1990; Chryssaphinou and Papastavridis, 1988; Geske et al., 1995; Neuhauser, 1994; Schbath, 1995). One reason for the interest is that searching molecular sequences for interesting motifs and searching pairs of molecular sequences for local similarities are both basic tools in computational biology. In order to assess the significance of the results obtained by these search methods, a suitable 'null hypothesis' distribution is required for comparison. That derived from the independent letter model is frequently used for this purpose; moreover, the Stein-Chen method is flexible enough to show that a suitable Poisson or compound Poisson approximation may also be used under more general null hypotheses (Reinert and Schbath, 1998). Another field where Poisson approximation is a fundamental tool is extreme value theory and the study of rare events, beginning with the work of von Bortkewitsch (1898). The decomposition of the Poisson Local Estimate into short range and long range parts is similar to that commonly found in extreme value theory: see Smith (1988) for the use of the Stein-Chen method in this context, and Dembo and Karlin (1992) for DNA-motivated applications. A related area is the analysis of coincidences, such as birthday problems and the space-time statistics of Knox (1964). Many further applications of the Poisson Local Estimate are given in Arratia, Goldstein and Gordon (1989, 1990).
2.2. The coupling approach

An alternative approach to sums of dependent indicators is to modify (2.6) by using conditional expectation directly, writing E{Sig(W)} =pi~_{g(W) IL = 1} . This yields a general version of (2.7) in the form
n
~{e~(g; W)} = Z p i { [ E g ( W 1) - ~_(g(W) lIi = 1)} .

i--1
(2.21)
89
In particular, if, for each i, W/* is any random variable constructed on the same probability space as W which has the conditional distribution of ~i,#iIi , given that Ii = 1, then (2.21) implies that I~{ez(g;
w)}l =
. np i E { g ( W + Z._
n
1) - g(Wi* +
1)}
(2.22)
< ~pg~lwi--1
~*lM,(g)
Hence, taking g = gZ,A, (2.3) and (2.4) imply the
Poisson Coupling Estimate

n
drv(5~(W), Po(2)) _< J2
~pi~lW - ~*k ,
i=l
(2.23)
with Ja G min{ l, 2 ~}. A variant of this coupling estimate can be used, when the construction of the Wi* is made easier by taking into account the value of some other random element Ui. Suppose that, for each possible value of u of a random element Ui, it is possible to construct Wi*(u) on the same probability space as W, so as to have the conditional distribution of ~i,i Ii, given that Ii = 1 and that Ui = u. Then
drv(~(W),Po(X))
<_&~-~pi
g: "~ 1
E I W - Wi*(u)[F,(du) ,
(2.24)
where Fi is the conditional distribution of Ui, given that/~ = 1. In both (2.23) and (2.24), the choice of the how to 'couple' a W and a If//* on the same probability space is entirely free, the aim being to make the right hand sides of (2.23) or (2.24) as small as possible. The estimates are particularly useful when the dependence between the /~ is global rather than local, and there is no natural choice of subset N / C { 1 , 2 , . . . ,n} for which/i and ~ ~i,Nili , are almost independent.
=
EXAMPLE 2.2. Let Gn,o be the Bernoulli model for a random graph on n vertices, in which each possible edge egj is present with probability 0, independently of all others: the edge indicators Eij ~ Be(0) are independent. Let Ii = I]j#i(1 - Eij) be n the indicator of the event that vertex i is isolated, so that W = ~g=~ Ii is the number of isolated vertices in Gn,o. Then pi=P[L=II=(1-0) n-l=:p
is the same for each i, and 2 = H:W = rip. The indicators 11,... ,In are not independent, because I1 = 1 implies that each of the vertices/2,.. , In is also more likely to be isolated, because the edges el2, el3, , eln are not present. However,
90
A. D. Barbour
this dependence is rather weak, and a Po(np) approximation appears plausible i f p is small, or, equivalently, if nO is large. To make this precise using (2.23), observe that, given any realization of G,,o, an associated realization of G,,o conditional on/~ = 1 is obtained simply by deleting all the edges eij, 1 < j <_ n, j i. Hence we can take ~* = W + ~
j=l
Uj - / ~ ,
(2.25)
jTLi
where Uj = Eij I-[l/i,j(1 - Eil). It is then immediate that IzIW- ~*l -< ( n - 1)0(1 - 0)" 2 + (1 - 0) "-1 . Substituting (2.26) into (2.23), we find that d r v ( ~ ( m ) , Po(2)) _< min{1, 2-1}2{(n - 1)0(1 - 0) "-2 + (1 - O)n-~}, _< (1 + nO)e-"e 2 , which is indeed small whenever nO is large. 2.3. Monotone couplings The example above is interesting, because W~* _> W except w h e n / / = for each i, IW - W/*] = W//* - W + 2Ii . In any application in which (2.28) holds, it follows from (2.23) that drv(L~(W), Po(2)) < rain{l, 1, so that, (2.28) (2.27) (2.26)
)-1} Zpi~(mii,
i=1
_ m -~- 2Ii)
= min{1,2 -1 } Z { E E ( W -//)Ii] - 2pi + 2p~} , i=1 since wi* has the distribution of W - Ii, given that Ii = 1; now, using ~7=1 pi = 2, we obtain the bound drv(~(W),Po(2))_<min{1,2 1} V a r W - 2 + 2 p] . (2.29)
Hence, so long as couplings satisfying (2.28) can be shown to exist, (2.29) gives a simple upper bound for the accuracy of Poisson approximation, which requires only that the mean and variance of W can be calculated. In a similar way, if couplings exist such that IW - W/*] = W - G* (2.30)
91
for all i, then it follows that

drv(S(W),Po(2)) <_ rain{l,2 1}(2 - Var W) ,
(2.31)
again requiring only the calculation of the mean and variance of W. EXAMPLE 2.3. Sample n < N times without replacement from an urn containing M white and N - M black balls. Let Ii be the indicator of the event that the ith n l i is the number of white balls sampled; set ball drawn is white, so that W = ~i=1 p = Eli = M / N . For any realization of an n-sample from the urn, a realization of an n-sample conditional on Ii = 1 is obtained by swapping the ith ball chosen, if black, for a randomly chosen white ball. Thus
n
[/Vii* :
W - li - ~ - ~ I j E j < W j=l
(2.32)
where Ej is the indicator of the event that the jth ball in the sample is chosen to be swapped with the ith, and (2.30) holds. Here, W has a hypergeometric distribution with mean 2 = np = n m / N and variance 2(1 - {2 + (n - 1)(1 - p ) / (N - 1)}), and we have shown from (2.31) that drv((W), Po(2)) _< min{2, 1}{p+ (n - 1)(1 - p ) / ( N
- 1)} .
(2.33)
Conditions under which monotone couplings satisfying (2.28) or (2.30) exist, and further general results, including lower bounds on the accuracy of the approximations, can be found in Barbour, Holst and Janson (1992), Chapters 2 and 3.
3. Probabilities of small counts
The emphasis up to now has been on approximating the whole of the distribution n l I i of indicator random variables. However, it is often of of a sum W = ~ i = greatest interest just to be able to approximate the probability P[W = 0] accurately. For instance, in reliability theory, I~ may be the event that component i in a large system has failed, and the failure of any single component may entail the failure of the whole system, in which case P[W = 0] is the probability that the system is still in operation. In such applications, 2 = ~W is typically of order O(1) or smaller, and the small system failure probability P[W > 01 is approximated by 1 - e -x, of order 0(2). In contrast, the bound in (2.16) is typically of order 2m(p + q), where m is the size of the neighbourhood of dependence, p = n -12 and mq ~ E(Zi]Ii = 1). This is small in comparison to 2 if mq is small, as is the case when failures do not tend to occur in clumps (Godbole, 1993). Note that, as observed in Kounias (t995), the second term in the general bound (2.16) can be improved, if only the probability PIW = 0] is of interest, since
MI(g2,{0}) = ,,~ 2()~ __ 1 _t_ e -2) , (3.1)
better by a factor which approaches 1/2 as 2 --+ O.
92
A. D. Barbour
In other applications, it is of interest to approximate P[W = 0] when 2 is large. Here, the problem is more delicate, because the total variation error bounds are then at best of order p = n -12, whereas the probability to be approximated should be about e }~, which can be much smaller; asymptotically, this will happen if 2 = 2n ~ cc faster than c log n, for some c > 1, and then the relative error in the Poisson approximation could be huge. Thus other methods are needed. We give two useful approximations, which require some preparation. Let H be a dependency graph on the vertices {l, 2 , . . . , n } for the indicator random variables (Ii, 1 < i < n). This means that the collections of random variables (/~, i c A) and (/~, i E B) are independent of one another whenever A and B are such that no edge in H has one vertex in A and the other in B. For example, if (Yy, 1 _<j < m) are independent random variables, and, for each i, Ii = f((Yj, j c Mi)) , (3.2)
for some subset Mi C { 1 , 2 , . . . ,m}, then H can be taken to consist of all pairs {i, {} such that M/N Mi, f). If H is fairly sparse, as in the case of dissociated random variables (McGinley and Sibson, 1975), the dependence between the Ii can be expected to be relatively weak. For the first approximation, we assume that the I~ are as in (3.2), but with f((Yj, j E Mi)) = I I YJ ' jemi (3.3)
for independent indicator random variables Yy. We set W = ~i=1 I i, pi = Eli and 2 = ~W as usual, and we define
{i,i'}EH
Then, for any 0 < t/_< 1, we have
Janson's inequality (Janson, 1990)" P[W < q2] _< exp{-2[(1 - t/) + t/logt/]/(1 + 6)} .
In the particular case t / = 0, this gives
//
(3.4)
I-I(1 - p i ) _< P[W = 0] < e x p { - 2 / ( 1 + 6)} ,

i=l
(3.5)
where the first inequality is a direct consequence of the FKG-inequality, because of the special form (3.3). Hence, by elementary analysis, exp - - ~ /
i=1
. - P'
(,
< e ' ~ P [ W = 0 ] _ < e }a
(3.6)
93
- 1 ~i=1{Pi/( n 1 - P i ) } 2 are small, log PIW = 01 is close Thus, provided that (3 and ) ~ n 1 - P i ) } 2 are small, the approximation of P[W z 01 to -2; if )~(3 and ~i=l{Pi/( by e -x has small relative error. Note that (3 appears as that part of the Poisson Local Estimate (2.16) which represents the effect of local clumping, if we take Zi = ~i,:(i,i,}~ Ii,. If (3 is not small enough, because of clumping, for this Poisson approximation to be accurate, an extension based on compound Poisson approximation can give better results. See M. Roos (1996). The particular form (3.3) may be too strong for many purposes. However, provided only that the dependency graph H is sufficiently sparse, the following inequality can be used. Defining
ui =
II
(1 - p i ) - l ; yi,i, = 2{E(IiIi,) +pipi,}uiui,; (3.7)
i':{i,i'}EH (3! ~- ,~--1 ~ Yi,i' >-- 2(3 {i,i')eH
we have
Suen's inequality (Suen, 1990)

P[W=0]I-I(1
i=1
-< I I ( 1 - P i )
i=1
eJ ~ ' - 1
(3.8)
Janson's inequality (3.5) is sharper than Suen's (3.8) whenever (3.3) is satisfied, though, if 26 and 2(3~are of the same order of magnitude and both are small, they give comparable results; Suen's inequality is more widely applicable. Janson (1998) has sharpened Suen's inequality somewhat. The Bernoulli random graph model has been the setting for a number of successful applications of Janson's inequality. One illustration, given in Spencer (1995, Section 1.1), concerns estimating the probability that the Bernoulli random graph is triangle free. Here, we demonstrate the use of the Janson and Suen inequalities in an analogous context, slightly improving upon a result given in Alon, Erd6s and Spencer (1992, pp. 101 2). EXAMPLE 3.1. Let Gn,o as before denote the Bernoulli random graph model, and set
Jijk = EijEjkEik ,
(3.9)
the indicator of the event that the triangle with vertices i, j and k is present. Set
/, -- II (1 - Jij ) , j<k j,kT~i n the indicator of the event that i is contained in no triangle, and let W = ~i=l/~. Then the event { W = 0} is the event that every vertex is contained in at least one triangle, and we wish to approximate its probability.
94
A.D.
Barbour
The first step is to observe that each of the Ii is a decreasing function of the independent edge indicators E;j. It then immediately follows (Barbour, Holst and Janson, 1992, Theorem 2.G and Corollary 2.C.4) that a coupling satisfying (2.28) exists, and hence, from (2.29) and (3.1), we have
]P[W = 0] - e - 2 t ~ 2-2(2 - 1 + e ~ ) ( V a r W - 2 + 2 n p 2) ,
(3.10)
where 2 = np -- ~W; note that, in this step, we use neither Suen's nor Janson's inequality, since no sparse dependency graph H for the Ii springs readily to mind. It thus remains to evaluate 2 and Var W, for which it is enough to compute p = P[I1 = 1] and q = P[I1 = I2 = 1]. If (3.10) is to be useful, then 2 should be neither too large nor too small, suggesting in the first instance that 0 = O(n) should be chosen such that p = p ( n ) ~ n 1~ for some 0 < ~ < oc. Now I1 = 1 exactly when Jljk = 0 for all 2 _<j < k < n, so that we have P[I~ : 13 = P[W1 : 0], where W1 : Z
2<_j<k<n
Jljk
(3.11)
Since the (Jljk, 2 <_j < k <_n) are a collection of Be (03) indicators, which satisfy (3.3) with (Ers, 1 <_r < s < n) in place of the Yj and with
Mjk = {(j,l),(k,l);l 1,j,k} ,

we can apply Janson's inequality. In fact, 2~ = IzVv] = (n21)03, {(j,k),(f,k')} E H and hence
)q~l =
(3.12)
if ]{j,k} N {j',k'}] = 1 ,
(3.13)
(n-l)2(n2
3)05
(3.14)
Thus, by (3.6), { l(n-1)( exp - ~ 2 03 "~a~<e,~lp[g~=01
\l-03J
j-
< e x p { ( n - 1 ) ( n - 2 ) ( n - 3)05 } .
(3.15)
Now to achieve p = P[I1 = 1] ~ n-lO requires that e -~ ~ n-tO, from (3.11) and (3.15): so define 0 = O(n,O) so that
(n-l)o3=logn-logO 2
for some 0 < 0 < oc. Then n-1 2 2 )(1_6~303) ~ (lgn)Zn-2
(3.16)

and (n - 1)(n - 2)(n - 3)05 ~. n3(n-21ogn) 5/3 = n-1/3(logn) 5/3 := en ,
95
(3.17) b o t h of which become small as n ~ oc. Hence, f r o m (3.15), if 0 is defined as in (3.16), then 21 = nP[I1 = 1] = 0(1 + O(~,)) . In similar style, we have q = P[I1 = 12 = 1] = P[W2 = O] , where (3.18)
3<l'<k<n
3<j'<_n
again a sum of indicator r a n d o m variables satisfying (3.3) with the E~s in place of the Yj, and n o w with
M12j= {(1,j,l),(2,j,l);l 1,2,j} U {(1,2, l);l 1,2,j}, 3 <_j < n; Mljk = { (1,j, l), (1,k, l); l l , j , k } U { (2,j,k) }, 3 <_j < k <_ n ,
and with M2jk defined analogously to Mljk. Here,
~--~-- {~(~-~) 2 +(n-2)

and Jt2~2 = 2 2
03=(n-2)203
{ (n 2)~2,n 3,+~+,n 2,3,n 3,)05

1 ) ( n - 2 ) ( n - 3)05 ,
= 2(nso that, by (3.6), exp -~
\1 - 03J
03,2}
< e/~2PIW2 = 0] _< e x p { 2 ( n - 1 ) ( n - 2 ) ( n - 3)05 } .
Thus, with the same choice o f 0 = O(n, O) and with ~n as before, we have q = P[I1 = I2 = 1] = e-~'2(1 + O(e,)) = exp { 2(n-2)[logn_logO]}( n - 17 1 +O(e.))
= (n-10)2(1 + O(e~)) ,
96 yielding Var W = np(1 - p ) +
A.D. Barbour
n ( n - 1 ) ( q - p 2 ) = 2 + O(en)
(3.19)
In conclusion, we obtain our approximation for the probability of the event that every vertex in Gn,o(n,O) is contained in at least one triangle, here re-expressed as P[W = 0] for W defined as above, by substituting from (3.18) and (3.19) into (3.10), giving
I P f w = 0] - e- l = o( o) , (3.20)
where O(n, ~) is defined in (3.16) in terms of O, for any 0 < ~b < ~ , and e, is as in (3.17); note that, for fixed n, O(n, ~b) is a decreasing function of ~,. This narrow range of values of 0 covers the interesting region where the probability of having vertices not contained in any triangle changes from 1 to 0. In fact, the range of values of 0 can be extended slightly, by allowing ~ also to vary with n, while still maintaining small relative error in the approximation of the smaller of P[W = 0] and P[W > 0]. The calculation above shows that Var W = 2 + O(O2en), and (3.10) remains valid; it then follows that ]eOP[W=0]-ll=O(Oe4'e,) and that
IO 1 P E w > 0] - 11 = in n -1 1 .
in
l_<~b_<llogn ,
This covers a range of probabilities for the event P[W = 0], running from a little more than n -1/3 to 1 - n -1.
4. Poisson point process approximation

n Under circumstances in which a sum W = ~i=~ li of weakly dependent indicator random variables has a distribution close to Poisson, it is usually also the case that WA = ~ieA Ii is almost Poisson distributed, for any fixed subset A c { 1 , 2 , . . . , n}. This suggests that, if each i is associated with a point yi in some carrier space Y}, then the random configuration of points described by the measure S = ~i~_1//6yi, where 6y denotes the point mass at y, might be distributed almost as a Poisson process on the set ~n =- {yi, 1 < i < n} c Y/. For instance, ifyi = y E ~ were the same for all i, then Y}n would consist of the singleton {y}, and 3 = W6y would represent W points at location y; in this case, Poisson process approximation for S is exactly the same as Poisson approximation for W, since there is no spatial distribution at all. A little more generally, the yi could be chosen as realizations Y/(co) of independent random variables Yi with common distribution #, which are independent also of the Ii. In this case, the process 3 = ~i~1 6~ has a non-degenerate distribution over Y/, being distributed as ~ 1 6zj, where the Z} are independent, with common distribution #, and are
97
independent of W. Then, if W is close in distribution to Po(2), the process ff is close to the Poisson process PP(2/~) on ~4 with mean measure 2/~, which can be realized in similar fashion as ~ - ' 1 6zj, where W* ~ Po(2) is independent of the Zj. Indeed, the latter representation easily shows that drv(50(E), PP(2/~)) <
drv(,e(W),
Po(2)) ,
(4.1)
where dry for probability measures P and Q over the space ~'~ of integer valued Radon measures on ~J is defined analogously to (1.6) by
dry(P,
Q) = sup
AC,gC
IP(A) -
Q(A)I .
(4.2)
What can be said for other choices of Yi? An obvious possibility is to take yi = i, and to look at the measure E = ~in_l/icSi as a point process on N+. It is then natural to introduce the filtration ( ~ i , 0 _< i ___n), and to suppose for each i that Ii is N~-measurable, as in Serfling (1975); once again, the martingale characteristics of the process (~7=11~(5i, 0 < m <_n) can be expected to play a part in the approximation bounds. Indeed, ifpi denotes ~-(Ii]~i-1) a n d / , is the measure which puts mass 2i = ~-Pi on i, then drv(5O(~),pp(2)) _< ~ { ~ ( p 2 ) +
i-1
~[pi_ iZpi]} ;
(4.3)
the expression on the right hand side is almost the same as that in (1.17), yet the distance being bounded is now between the distributions of whole processes, and not just of total numbers of points. This inequality is a special case of a theorem of Brown (1983), which compares the distribution of a simple point process M on N+ to that of any Poisson process PP(p), and shows that
drv(~(Mt),PP(pt)) < ~_[A- p[t + E{ ~-~ AA2(s)}

s<_t
(4.4)
for any fixed t > 0, where M ~ and/~t denote the restrictions of the corresponding measures to [0, t], A is the compensator of M, and IA - ~lt is the pathwise absolute variation of the signed measure A - ~ on [0, t]. These inequalities provide a rather satisfactory basis for Poisson process approximation on the line. However, two obvious questions remain. The first is clearly how to proceed when the carrier space is not the line, and there is no preferred direction for time. The second concerns the bound (4.3): is it best possible? This latter question is illustrated by two examples. EXAMPLE 4.1. In the simple case of independent Ii ~ Be(p), it is immediate that (4.3), like (1.17), is of order np 2, and not of the better order p found in (1.7) and (1.11). EXAMPLE 4.2. Take a Cox process [0,1] in which, conditional on A, ~ ~ PP(Av), where v denotes Lebesgue measure. Suppose that P[A = 1 +t/] =
98
A. D. Barbour
P[A = 1 - r/] = 1/2. Then, taking # =- v in (4.4), and including the random element A in Y0, it follows that ]A - vii = IN - II and that AA(s) = 0 a.s. for all s, so that (4.4) yields drv(~(~), PP(v)) _< E I A - II = t/ . On the other hand, as for (4.1), it is immediate that dTv(~(~),PP(v)) _< drv(no(A),Po(1)) < r/2 , (4.6) (4.5)
the final estimate following by the Stein-Chen method (Barbour, Holst and Janson, 1992, Theorem 1.C (ii)); this is of better order for small t/. The process bound given in (4.3) is an analogue of Serfling's bound (1.17). There are also process analogues of the Stein-Chen Poisson Local and Coupling Estimates. The link to the random variable context (Barbour, 1988) is afforded by rewriting (2.1) in the form 2[hLA(j + 1) - hLA(j)] +j[h~t,4(j - 1) - h;~,A(j)] = 1A(j) -- Po(~.){A} , (4.7) where g)0,A= AhA,A, and by observing that the left hand side takes the form (~h2,A)(j), where d is the infinitesimal generator of a simple immigration-death process with immigration at rate 2 and unit per-capita death rate. For process approximation, define a generator ~ on -~ analogously as follows: ~h(~) = [h(~ + 6s) - h(~)]2(ds) + [h(~ - 6,) - h(~)]~(ds) ,
for all ~ E g/f, for a suitable class of functions h on . With this definition, ~ is the generator of an immigration-death process Z with state space ~uf, immigration intensity and unit per capita death rate. The process Z has equilibrium distribution Po(),), the distribution of a Poisson process with mean measure ~,. Now, by analogy with (4.7), given any f belonging to a suitable family f f of bounded test functions, we wish to find an h = hf on J f such that ( J h ) (~) = f ( ~ ) - Po(Z) (f) , (4.8)
where # 0 c) denotes f f d/~. For total variation distance, f f = f f r v consists of all indicators of measurable subsets A c ~f~, mirroring the approach of Section 2, but other choices of f f may also be appropriate. It can now be shown that Eq. (4.8), the Stein Equation for the generator su, has a solution given by h(~) = -
[ ~ f ( Z ( t ) ) - Po(2)(f)]dt ,
(4.9)
where ~ is the conditional expectation given Z ( 0 ) = 4. Then, for any point process ~ defined on [0, S],
99
dg(~(2),Po(~,)) := sup IF_f(E) -Po(,~)(f) I = sup I~dhf(~)l

j'~ f Eg
(4.10) where
Edh(z) = E f0 S [h(Z + 6,) - h(Z)]~4d~)
+ E
/0
[h(S - b,) - h(e)]S(ds) .
(4.11)
In particular, take ~ = ~ 7 - 1 [i6Yi , and consider approximating 2'(~) as a point n pi N i and E is the distribution of Yi given process on ~ by PP(~), where ~ = ~i=1 Ii = 1; in this setting, (4.11) becomes
~_Nh(S) = F_
Pi
h(S + by) - h(S)]F/(dy)

. (4.12)
+ }--~Ii[h(S - 6~) - h ( e ) ]
i=1
Just as in the Stein-Chen situation, estimation of the final expression in (4.12) is often easier than trying to work directly with ~_f(E), and leads to tractable expressions; these involve the quantities J l ( f f ) := supMl(hf)
fE@
and
J2(ff) := supM2(hf)
fc~
(4.13)
in place of the J~ and J2 of (2.3), where

M~ (h) := sup Ih(~ + 6~) - h(~)l;
l <i<n
Mz(h):= sup [ h ( { + 6 i + 6 j ) - h ( { + 6 i ) - h ( { + 6 j ) + h ( ~ ) l .
(4.14)
{~YP
l<i,j<n
In the case of total variation distance, the bounds J l ( Y r v ) _< 1; J2(~rv) _< 1 , (4.15)
are easily derived using the representation (4.9). To show how (4.10) and (4.12) can be applied, take first the setting of' Example 4.1, where the Ii ~ Be(p/) are independent. Then, setting Ei = ~jilj6j, and retracing the argument used in the case of random variables from (2.6) to (2.9), it follows that
; { I i [ h ( Z - 6i) - h ( Z ) ] ) = ~ { ~ [ h ( Z , ) -
h(Zi +
6/)1}
= p,E{h(Z,) - h(Z, + 6,)},
(4.16)
100
A. D. Barbour
since//and Ei are independent. Thus, from (4.12),

n
~h(z)
-- ~ p ~ ( h ( Z
i=1
n
+ a/) - h(Z) - h(Z, + a~) + h(Z~)}

+ ai + / / a , ) - h(zi +//6,) - h(zi + a~) +
= ~pi~{h(z,
,=1
h(zi)},
(4.17)
analogously to (2.7); and the quantity in braces is zero ifli = 0, and is in modulus at most M2(h) i f / / = 1. As a result, we obtain the estimate
n
IlEdh(E)[ _< ~p2iM2 (h) ,

i=1
(4.18)
and conclude from (4.10) that

n
2 d~-(Sg(E), PP(J,)) < Z P i J 2 (J i=l
oz"
") .
(4.19)
The argument was exactly as for the sum W = ~ i =n1 / / , apart from the slightly different notation. For total variation distance, as originally considered in Exn 2 ample 4.1, it follows from (4.15) and (4.19) that dTv(Y)(E), PP(a)) < ~i=lP~, the same bound of order O(np 2) as followed also from (4.3), so that the random variable order O(p) is still not recovered for the process bound. In fact, // ,~, consideration of the event U~=I{~{~} > 2} shows that the total variation distance between ~ ( E ) and PP(J,) is at least
_ _
n
_ ~
l e-Ip 2
~-~(1i=1- e-Pi(1 +Pi)) > i~=l
i ,
and thus that the order of approximation cannot be improved in this setting. For dependent indicators//, process approximation can be derived from the approaches used for random variables in Sections 2.1 and 2.2, in a similarly direct way. For the local approach, suppose that, for each i, ~ can be written in the form E = E~ + Hi + / / 6 ~ (4.20)
for random elements Ei and Hi of ~,~. Modifying (4.16) in the same way that (2.6) was modified to obtain (2.13), we find that
[~{.[i[h( ~___~ -- (~y~) - h(,.,~)]) ~- [~{//[h( Z i -[-Hi) - h( ~,.,i -[- H i -~- ~y.)J}
= E{//[h(Zi) - h(Zi + fir~)]) + Oil(h) where ]01i(h)l <_ ~-(//Zi)M2(h)
(4.21)
(4.22)
101
and Zi = HHil] is the total variation norm of Hi. In a similar fashion, we obtain
p~ f ~Ih(z + ~y) - h(Z)lF~(dy) <_p~IE(Z~ + I~)M2(h) ,

and
pif
~{h(Zi + ~) - h(~;)]F~(dy)
(4.23)
pi J ~[h(z~ + 6~) - h(Z,)lF~(dy) +
~{It{h(Z~)
h(Z~ + 6~)]}
< pile f [h(Ei + by) - h(Ei)]Fi(dy) - f [h(Ei + by) - h(Zi)]Fi(dylY.i ) <_ 2pi[E{ dw(Fi, F~(. ]~i))}M1 (h) =: xiM1 (h) ,
in direct analogy to (2.10) and (2.14), where F/(.]~) denotes to the (4.24)
~9(y/]~i).
This leads
Poisson Process Local Estimate

n
d y ( ~ ( E ) , PP(~)) <_ Jl ( ~ ) Z
i--1
Zi + J2 ( g ) E { P 2 + piEZi + E(IiZi) } .
i=l
(4.25) Note that there is typically nothing more to be calculated here than for the Poisson Local Estimate (2.16). The main difference between the bound in (4.25) and that in (2.16) is that J1 (Y) and J2 ( ~ ) replace J1 and J2 of (2.3), respectively. Note also that if Y/= i a.s. for all i, then one can take )(~ = ~z]E(Iil~i) -Pzl, in direct parallel with the random variable case. Alternatively, for a coupling version, (4.21) can be replaced by ~{//[h(Z - 6~.) - h(Z)l ) = p i e [h(./y) - h ( - , + by)]Fi(dy) ,
~, ~,*
where ~y ~'* is any random element of 34f on the same probability space as ,E, having the conditional distribution of ~i,ili, b~,, given that Ii = 1 and Y/=y. This directly implies that
IE(dh)(Z)l= ~
-<~
n/
pi
[h(Z+6y)-h(3)+h(~iy)-h(~iy+6y)]F~(dy)
(4.26)
pi
'~* E l l S - ~iyllFg(dy)M2(h) ,
in analogy to (2.22), and taking h =
hf
for f E ~ and using (4.10) gives the
102
A . D . Barbour
Poisson Process Coupling Estimate
d ~ ( ~ ( E ) , PP(/,)) _< J 2 ( : )
p;
n ll - i IIFKdy)
(4.27)
The corresponding bounds for point processes not concentrated on a discrete carrier space are given in Barbour and Brown (1992b). Both (4.25) and (4.27) can be used for a wide variety of metrics dg, the differences between the estimates obtained lying in the values of the factors J / ( Y ) . For total variation, the bounds (4.15) are of best possible order, and do not decrease with increasing 2 = ])t], as do the J1 in the random variable case. Certain Wasserstein metrics have been considered in place of total variation distance (see Barbour, Brown and Xia 1998, for example), for which 2-dependence similar to that in (2.3) can be established; these ideas have been applied in the context of networks of queues in Barbour and Brown (1996). Multivariate Poisson approximation can also be undertaken, by applying the considerations above, but with the set ff consisting of only finitely many points. Some particular results in this direction are given in Barbour, Holst and Janson (1992, Theorems 10.J and 10.K). The best results for sums of independent random vectors are those of B. Roos (1999), who exploits an extension of Kerstan's method. EXAMPLE 4.3. This example is a discrete version of the Cox process Example 4.2. Suppose that, conditional on A, the D ~ Be(A) are independent, where PIA = 1 + t/] = P[A --- 1 - t/1 = 1/2, and that Y/= i a.s. Then an easy calculation shows that, if we take ~i=~_,i,iI,.,6i , and Hi = 0 a.s. in (4.20), then Zi = O(n-lt/2); for /, = }-~i~l n-16i, the Poisson Process Local Estimate (4.25) yields the bound drv(Se(Z), PP(~)) = O(rl 2 + n -1) . This estimate is the discrete analogue of inequality (4.6), showing that a better bound than that obtained from (4.3) in (4.5) can indeed be derived. EXAMPLE 4.4. An old Poisson approximation problem is that of counting the number of close pairs, when m points are uniformly distributed on a sphere; an early treatment of Perelson and Wiegel (1979) was motivated by the immunological phenomenon of the IgM/IgG switch. Here, we use Poisson process approximation to study a related problem, taken from M~nsson (1999): see also Aldous (1988a), Alm (1983), Silverman and Brown (1978). Suppose that m independent points ~1,..., ~m are distributed uniformly at random in the square A = [0, 1]2. Let the k-subsets Ki of { 1 , 2 , . . . , m} be indexed in some way by i c { 1 , 2 , . . . , n}, where n = (~). Let Ii denote the indicator of the event UacA{njcx,{~j E C + a}}, where C + a denotes the a-translate of the square C = [0, c] 2 for some c < 1/2, and where the torus convention is used throughout to avoid edge effects: L = 1 when the k-subset Ki is covered by some square of
103
side c. When/,- = 1, let Y/be the (torus) centre of gravity of { @ j E K i } , and set Y,. = (0,0) otherwise. How close is the distribution of the point process 3 = ~i~=i Ii65 to that of a Poisson process on A with mean measure nE[I1]v, where v denotes Lebesgue measure? Clearly, if ]KjnKil >_2, then P [ I j = 1 I/i = 1] > P[Ij = 1], so that there is positive dependence between Ii and Ij when Ki and Kj overlap, and, in this case, Y/and Yj are also dependent. However, if k is fixed and c is small, the effect of this dependence is not too great. To see that this is the case, we use the Poisson Process Local Estimate (4.25), setting Hi = ~ j ~ I j 6 ~ and ~i = Z - H / - I/c5~, where N / = {j : j i, K j N K i (~}. Note that Zi depends only on the points (I, l~Ki), and is thus independent of (Ii, Yi), so that zi=drz(Fi,E.(.lT, i ) ) = O for each i; note also that Pi = EIi= k2c 2(k-1) =: p and that F/is uniform on A, so that ~i~=1 piFi = 2v, with )~ : np as usual. Thus
~-Zi < k k - 1 p
and
<
fik(mc2)k-1
k-1 (~)(k) E(iiZi)<_p~--~(4c2)k_, -k l l=l

for constants ~ks and ilk; hence, from (4.25), d~-(~(ff), PP(2)) < J 2 ( ~ ) 2 0 ( m c 2) ,
k-1 <pZc~(mc2)S - s=l
(4.28)
uniformly in mc2<_ 1. Thus if m - ~ oc and cm-+ 0 in such a way that ,tm~kam%~lk-ll/k!~2, it follows that drv(2'(~m),PP(~m)) is of order O(m-1/(k-l)), and becomes small as m - + ec, so that Poisson process approx2 = m -k/(k-1)+~ for any imation is asymptotically accurate. However, if c m > ( k - 1) 2, then the bound in (4.28) becomes large with m.
5. Compound Poisson approximation

As observed in Section 2, the Poisson Local Estimate for the total variation distance between the distribution of a sum W of indicators Ii and the Poisson Po(EW) distribution becomes large if the Ii have a tendency towards local clumping, because it includes the component (2.18), and this usually reflects a real departure from the Poisson. However, there are many probability models (Aldous, 1988b) in which rare, isolated and weakly dependent clumps of ls appear, and a compound Poisson approximation may be appropriate instead. The compound Poisson distribution CP(2,/1), where/~ is a probability measure on N, is defined by
104
A . D . Barbour
where (Yy, j > 1) are independent, have distribution p and are independent also of M ~ Po(2); and where (Mi, i > 1) are independent, with Mi ~ Po(2/~i). In the former representation, one thinks of M N Po(2) clumps, whose sizes Yj are independently sampled from the distribution p. As in the case of Poisson approximation, there have been a number of approaches to compound Poisson approximation using transform methods. Of particular note is that using signed compound Poisson measures, as illustrated, for instance, in 0ekanavi6ius (1997), where very accurate approximations are derived. Unfortunately, the use of such methods for sums of dependent random variables seems at present to be limited to only the simplest of cases. One way of approaching compound Poisson approximation to sums of dependent indicators is to proceed by way of Poisson point process approximation. The typical strategy is to mark exactly one of the l s in each clump as its representative, and to replace W = ~ i n = l li by a sum ~i~1 ~l>1 lIi,, where Iil now denotes the indicator of the event that i is the index of the representative of a clump of ls of size l; thus, for each clump, exactly one of the Iil takes the value 1. Poisson process approximation in total variation to the point process Z, = ~ i n l ~-~l>l Iilal is then accomplished by the Poisson Process Local or Coupling Estimates (4.25) and (4.27), and compound Poisson approximation in total n variation to the random variable W = ~ = 1 ~l>1 lI~l, with exactly the same error bound, follows as a consequence. There have been many successful applications of this approach, a number of which are given in Arratia, Goldstein and Gordon (1989, 1990). The results obtained from it are very good, provided that EW is not too large. There are two drawbacks to the point process approach. First, the identification of a unique representative for each clump ('declumping') is rarely natural, and can pose difficulties. Secondly, if 2 is large, the error bounds derived in this way are frequently far from accurate, because they are of the weaker O(np 2) variety, rather than of order O(p). For point process approximation in total variation, this often represented the true order of approximation; in compound Poisson approximation, it is rather seldom the case, as is illustrated in the examples which follow. However, the problems associated with improving the approximation for large 2 have not been fully overcome. EXAMPLE 5.1. Let ~j = IiI[Yi _>j], 1 1, be a double array of indicators, in which the Ii ~ Be(pi) are independent, and the Y,. ~ p(i) are independent of each other and of the/i. 'Declumping' is easily achieved by defining Iil =IiI[Y,. = l] for each i, l; then setNd = { (i,j),j l} and write ~ = I~161 + Hil + Ell as in (4.20), where
jGNit jT~i l>_1
here, for all i, l, Ell is independent oflil, so that Zil = 0, and IilZil = 0 a.s., where Zil = IIH,lll = ~]cx,, Ii]. Hence, with 21a = ~i~_1 ~z>_l ~i16z, the Poisson Process Local Estimate (4.25) reduces to
105
EZ
dTV(L~(Z),PP(A)) _< J2(@TV
i=1 /_>1.
n
ElilEIi <_ ~-}p2
i=1
(5.2)
To illustrate the implications of (5.2), let p(n) be such that p(n) --+ 0 and np(n) -+ oe as n ~ o% and consider three choices of the Pi = Pl n) and/~(i) = ~(in). (a) Suppose that p}n) = p(n) and/~(i,) =/~ for all i. Then (5.2) gives a total variation bound of np(n) 2 for approximation by CP(np(n),l~); however, combining (4.1) and (2.9), the true error is much less, being at most p(n). (b) Suppose that the p}") and/~(i,) are as above for 2 0; suppose also that pl ") = 1/2 and that ~0n) = 61 for all n. Then (5.2) gives a bound of order O(1) for approximation by CP(2,,/~,), where 2, = ( n - 1)p(n)+ and /~, =2~1{(n - 1)p(n)/~+lc~l} ;
here, the true error is again smaller, and in fact tends to 0 with n, being of order O(p(n)+ [np(n)]-l). (c) Suppose that everything is as in (b), except that now/~{27/+} = 1, so that, in particular,/~l = 0. In this case, the bound of order O(1) furnished by (5.2) is of the correct order. The contrast between cases (b) and (c) indicates that improving the error bounds for compound Poisson approximation that are derived using the point process approach is likely to be a delicate matter. An alternative to the route by way of Poisson process approximation is to aim for compound Poisson approximation directly, using the second representation in (5.1) to derive an analogue of (2.1) for compound Poisson distributions, the Stein Equation 1A(j) = CP (2, p) {A} + Z
i_>1
i21~ig2,~,A(j + i) - Jg~,,X (J') ,
(5.3)
to be solved for the function g,~,t~,A : ~ ---+ ~ for any given A C Z+. This accomplished, it follows just as for (2.4) and (2.5) that
d r v ( S ( W ) , CP(2,/~)) <_ sup [E{e;4,(Ox,~,,A;W)} I ,

AcZ+
(5.4)
where
e~,~(g; w) = ~
i>_l
i2#ig(w + i) - w g ( w )
(5.5)
once again, IE{ex,u(g;W)}/can often be successfully bounded. For instance, let W = ~i~1 iV/, where the X/ are nonnegative integer valued random variables with finite means ml~. For each i, analogously to (2.12), decompose W in the form
W = Wi + Z~ + Ui + X~ ,
(5.6)
106
A.D. Barbour
where, for the representation to be useful, W// should be almost independent of (X/, Ui), and Ui and Zi should not be too large: the sense in which these requirements are to be interpreted becomes clear shortly. Such a decomposition is often realized by partitioning the indices { 1 , 2 , . . . , n} into subsets {i}, &., Ni and T~, and setting U~.=ZX j
jESi
and
Zi=ZXj
jEN~
Si contains those Xj which strongly influence X/, and T/those Xj whose cumulative effect on (X/, Ui) is negligible. Define the parameters of the canonical approximating compound Poisson distribution by
i=1
1
~l =~
n /=1
(5.7)
t_>l.
Setting ~l~=jP[Xi=j, Ui=k]/m~i,j> 1 , k > O , define the four following quantities which appear in the error bounds, and which should be small for the bounds to be good:
62= ml,
p[x'l;kl[p[Xi=j,Ui=k]
J'--U/1--kl~W/]
-
I ,
(5.8)
i=1 n
j_>l
62 = 2 ~
i=l
F{X~drv(Y(W/IX~, Ui), ~ ( N ) ) } ;
(5.9)
i=l n
64 = Z { ~ - ( X i Z i ) - + - ~ i ~ - { ~ t ' - ~ g i - f - Z i } } . i-1
(5.11)
In (5.10), the distance dw is the Wasserstein L 1 metric on probability measures over 7/+: dw(P, Q) = sup{f:M~(f)<_l} l f f dP - f f dQ[ >_dry(P, Q). For 1 < l < 3, &/~_W is a measure of the average dependence between W~and (X/, Ui); 64 is small in comparison with fW if U/and Zi are small in expectation, provided that Zi is not too strongly dependent on Xi. Note that this makes no restriction on the dependence between Xi and U~. If Poisson approximation is to be good, nothing can be too strongly dependent on any single X~, and so no element U~ is allowed to accommodate strong local dependence in the Poisson decomposition (2.12). In compound Poisson approximation, such local dependencies can be taken into account. The expectation ~-{XiUi} does not appear in 64; instead, the U~ are prominent in the definition (5.7) of the approximating compound Poisson distribution.
107
Analogously to (2.15) and (2.22), and by rather similar arguments (see also M. Roos (1994a, b)), it can be deduced that for the canonical )~ and # [~:{~,,(9; W)}I _< ~0M0(g) + elMl(9)
,
(5.12)
(i) with e0 = min(cS1,62) and el = 64, and (ii) with e0 = 0 and el = 63 + 64; these bounds are then to be combined with (5.4) to give total variation bounds. If, instead, approximation by another compound Poisson distribution with parameters 21 and pl is preferred, note that I~',,, (g; w) - a~,,(g; w) l = ~i>li(]tl#1 - )c#i)g(w -~- i)
- z .~ < . ~ i#!l.)J'll-- 211g(w + i) l + i>_l
i(2"#'i - APi)O(w + i)
i>1
_< mll[j ( , - J(IM0(9) + )~mldw(Q I, Q)MI(9) ,
where m~ = ~i>1 i#i, m'l = Y'~i>~ t#i,"' 21I = )cml/m~, and the probability measures Q and Q~ on N are such that Q{i} = i#i/ml and i f { i } = i#1i/m'l. This leads to the more general bound IIze2,.,(g; W)] < e~oMo(g) + a'lMl(g ) , with (5.13)
a~o=eo+12m1-)/m111
and
e11=sl2mldw(Q',Q
) .
(5.14)
In particular, if 2/ is chosen equal to 2rnl/m~, then e~ = e0. The advantage of allowing distributions other than the canonical compound Poisson distribution as approximations is that the canonical distribution may be very complicated, whereas an approximation of the same order may be obtainable with a very simple compound Poisson distribution: see Example 5.2. For independent Xi, one can take Zi = - U i = O , for which choice K-,n {[~,)2 61 = c52 = 63 = 0, and 64 reduces to z_~i=l~ ij This observation can be used in the setting of Example 5.1 also, by setting X / = ~j>_l V/j. Alternatively, one could use the decomposition (5.6) with Uij = ~ i C j Vii and Zij = 0, obtaining the same result. In general, when evaluating c52 and 63, it is often possible to compute the distances between distributions by means of couplings. Variant (ii), when applied with Zi = 0, gives the analogue of the Poisson Coupling Estimate; variant (i) leads to the analogue of the Poisson Local Estimate. In order to exploit (5.4) and (5.12), it thus remains to find suitable bounds for
jcP = sup Mo(gLts,A )

ACZ+
and
J2CP = sup M~(g;,~,~) . ACZ+
(5.15)
However, this is not as easy as it might seem. The only known analogue of (2.3) is the bound
jcP jCP < min{1, (2#1)-1}e ~
(5.16)
108
A . D . Barbour
proved in Barbour, Chen and Loh (1992), and to get bounds which increasing 2 they needed to assume that
ikt i > (i +
decrease with
(5.17)
1)#i+ 1
for all i _> 1, in which case
jce <_m i n . 1,
1_ 2#2) (2 1
1
@2(#1-2#2))};
J2 cP _< min 1,2(#1 _ 2#2)
42(#1 - 2#2) + log+{22(#1 - 2#2)}) } (5.18)
These bounds, together with (5.4) and (5.12), yield the
Compound Poisson Estimate 1. If W is decomposed as in (5.6), and 2,/~ and &,

1 < l < 4, are as in (5.7)-(5.11), then, for any 2',/g, we have
t __ TCP t d~v(SQW),CP(2',p')) _< ~CP a 1 go ~-a~ q ,
(5.19)
where gl) and g] are as in (5.14); jcP and jcP are bounded as in (5.16), or, if condition (5.17) holds, as in (5.18), in either case with 2',/~1 for 2,#. Thus, in Example 5.1(a) and (b), if ml < ec and #1 > 2#2 and (5.17) holds, then bounds of order
O(p(n) log{np(n)}) and O((p(n) + [np(n)]-1) log{np(n)})

are obtained; not quite of the correct order, because of the factor log{np(n)}, but much sharper than those implied by (5.2). Condition (5.17) cannot hold in case (c), and if (5.17) fails to hold in cases (a) and (b), the bound is of order p(n) exp{np(n)}m~, rapidly becoming worse than those implied by (5.2) as n increases, and giving no information at all if ml = ec. What is more, Condition (5.17) is no artefact, since it is shown in Barbour and Utev (1998) that jcP _> Ce~ for some C, ~ > 0, whenever #1 + #2 = 1 and #l < 2#2. In order to try to avoid these difficulties, Barbour and Utev (1999) exploit the Stein Equation in a different way, by varying the usual argument so as to involve only j}a) = SUpAcZ+M}a~(g~.,A),l = 1,2, where
M~a)(g) = M 0 ( g ( - + a ) ) ;
Mla)(g) = M a ( g ( ' + a ) ) ,
for suitably chosen a > 1. This leads to the
Compound Poisson Estimate 2. If W is decomposed as in (5.6), and ,~ > 2,/~ and cSt, 1 < l < 4, are as in (5.7)-(5.11), then, for any 2',/~r satisfying 2'm] = 2ml, and such that ~ j > l # Y < oc for some r > 1 and that/~' is aperiodic (#'{/7/+} < 1 for all I > 2), we have

(a) drv(5~(W), CP()/,/~')) < e0J}a) + 8'lKI a) P[W < 1 ()~rnl + a )]K~
109
(5.20) for a = 2vma and for 0 < v < 1 suitably chosen, where
Kla) =jfa) + 2j~a)/(),ml(1-v)};

and
K~a)-~ 1 + 2ml2J~a)/{ml(1--12)} ,
(5.21)
j~a) ~ (,~t) 1/2Co(~t),
j~a) ~ (~t)-lCl(~lt) ,
with C0(/l'), C1 (/~') < e~. The detailed way in which v is to be chosen and in which the Cl(/~') depend on the radius of convergence of the power series }-~j>_l#)zJ and on the nearness of/~r to being periodic are explicitly specified in Barbour and Utev (1999). The third term in (5.20) is a penalty incurred in exchanging the j}a) for the Applying (5.20) to Example 5.1, assume that /~ has radius of convergence greater than 1 and is aperiodic. Take 2' and/~' to be the canonical parameters. Then v can be chosen fixed for all n, and KIa) and/(2(a) are uniformly of order j(a) and 1 + j~a) respectively. In cases (a) and (b), both Co (~n) and C1 (pn) are bounded uniformly in n, and hence, from (5.21), bounds of order O(p(n) + exp{-np(n)e}) for some ~ > 0 are obtained, with Bernstein's inequality being used to derive the exponential estimate of the large deviation probability in the third term in (5.20). This is of the ideal order O(p(n)), except when np(n) ---+oc very slowly with n. At first sight, it appears that the same bound should also follow in case (c), contradicting the fact that the true distance in total variation is of order O(1). The reason why this bound is not obtained in case (c) is that the distribution /~, approaches the periodic limit /~ as n ~ oc, with the result that l i m , ~ C1(/1~)= ~ , l = 1,2; in fact, Cl(/~,)x [np(n)] -~ for l = 1,2, and the bound obtained from (5.20) and (5.21) in case (c) is once again of order O(1). EXAMPLE 5.2. We conclude with a second example of compound Poisson approximation. Let {V~;, 1 _< i,j < t} be independent Be(q) random variables, and define the indicators/,-j -- V~yij+l V~+ljV~+lj+l of the events that Vkl = 1 at each of the 2 x 2 square of lattice points (k, l) with bottom left hand corner (i,j); we adopt the torus convention throughout, in order to avoid edge complications. Define n = t 2, p = ~Jij ~ q4 and 2 = rip. What is the approximate distribution of W = ~l<ij<jlzj as t ---+~ , if q = q(t) -+ 0 in such a way that np(t) ~ ~v? A first approximation is obtained from the Poisson Local Estimate (2.16), using a decomposition W = Wij + Zij +Izj as in (2.12), defined by taking
z j=
(k,l)cN,}\{(ij)}
Ik,,
where N~ = {(k, l) : ]k - i] _< r, ]l - j ] _< r}. Then W/j and Iij are independent, so that Z;j = 0 in (2.16) for all i,j, ~IijE(Izj + Z/j) = 9p2 and ~-(IijZij) = 4(q2+ q3)p, which from (2.16) gives the bound
110
A.D.
Barbour
drv(LP(W), Po(np))
_< 9p + 4(q 2 + q3)
= O091/2)
(5.22)
The leading term in this error bound comes from E(IijZq). This indicates that local clumping is responsible for the main departure from the Poisson, suggesting that a compound Poisson approximation may be better. For compound Poisson approximation by way of Poisson point process approximation, it is first necessary to identify clumps. One way of doing this is to construct the (random) subgraph H of the t x t rectangular lattice which has edges only between neighbouring lattice points (k, I) and (kI, l 1) at which Ikl = Ik,l, = 1. Clumps can then be identified with the connected components of H which contain at least one 2 x 2 square; let o- E 5p index all possible components of this type. Then a point process 3 can be defined by ~ Jo6~, where J~ = 1 exactly when a appears as a component of H. The point process 3 can be decomposed (4.20) as ~=J~6o+H~+~a, with H~=~-]~,~sGJo,6o, and 5G = {o-I E 5p : oJ a, oJ Cl a + ; ~)}, where a + consists of a and all its lattice neighbours. In this decomposition, JG and Z~ are independent and JaZ~ = 0 a.s., where Zo = [[H~[[, leading to a Poisson Process Local Estimate (4.25) of
+
7G ~,c,~ \ 6t E ,~.% /
where p~ = EJ~ = ql~l(1 - q)l ~+1-11, and the same bound then holds also for the compound Poisson approximation error drv(Y(W),CP(2, p)), with 2 and /~ defined by
)4~= ~po6,(o) ,
aE5 e
(5.24)
where l(a) denotes the number of 2 x 2 squares contained in rr. To evaluate the bound in (5.23), note that np = ~ p ~ , and that po = p for each of the n components a consisting of just a 2 x 2 square. Thus the bound is at best of order O(np2); this improves upon the order O(p 1/2) of the Poisson approximation (5.22) when np is not too large, but is worse if n >> p-3/2. Note also that the compound Poisson distribution in (5.24) is complicated to compute, as is the bound (5.23), since both contain elements from almost all possible components of subgraphs of the (large) t x t lattice. In practice, one would probably prefer to incur an extra penalty of ~ c S s ~ p ~ in exchange for neglecting ~cs~<8~ J~6~, where 5P(8) = {~ E 5P : Icrl > 8} contains all elements of 5 with 8 or more vertices. This reduces l(a) to taking only two possible values, 1 and 2, in the compound Poisson approximation CP(2,/~(8)), where 2/~(8) = ~
~c5~\5o(8)
pa6z(~),
(5.25)
but the bound remains awkward to compute, and is still of order only O(np2). For direct compound Poisson approximation by Stein's method, one can take the decomposition I/V= Wiij+Zij+ Uij+Iij in (5.6), where now Gj =
111
~(k,t)~N~)\{(id;~ ,,Ikl (the previous Zij) and Zij -- >-(k,l)cN~}Vv,)-2,-~ Ikl, SO that Wij = ~(k,1)~u~Ikl is smaller than in the Poisson case. Then W/j and (Iij, Uij) are independent, making 61 = 62 = 63 = 0 in (5.8)-(5.10), and
I,j (xij + vii + zij) + ( szij)
= 41p;
Now the canonical approximating compound Poisson distribution CP(2,/t) has
2#l = l-lnlE{IllI[U11 = l - 1]},
l _> 1 .
(5.26)
Since U11 is a function of the configuration of ls on a 4 x 4 square, its distribution is easily calculable algebraically as a function of q. In particular, 0 _< Ull <_ 8 a.s., and ( l + 1)#l+l/(/~l ) = O(q) for each 1 < / < 8, so that condition (5.17) is satisfied for all t large enough; furthermore,
)c#1 = npP[Ull = 0[Ill = 1] > np(1 - 4(q 2 + q3));

2)~#2 = npP[Ull = 1[Ill = 1] < 4np(q 2 + q3) . Hence, from the Compound Poisson Estimate 1 (5.19), it follows that, for all t sufficiently large,
d r v ( ~ ( W ) , CP(2, #))
41p { 1 ~< 1 - 8(q 2 + q3) 4np(1 - 8(q 2 + q3)) q- l o g [ 2 n p ( 1 }
- 8(q 2
+ q3))] (5.27)
= O(p(t)log{rip(t)})
This is very much better than (5.23), and is only inferior to (5.22) for very large values of n: for instance, if p(t) = t-~ for any 0 </3 < 2, then the order obtained from (5.27) is O(t-~logt), almost the optimal O(t-~), and much better than the order O(p(t) 1/2) = O(t -3/2) obtained for Poisson approximation. Squares of any size r x r are treated in this way in Barbour, Chryssaphinou and Roos (1996). If the Compound Poisson Estimate 2 is used instead, the result is less explicit than that of (5.27), and it is necessary to bound the probability PIW <_yIEW] for y < 1. However, we can use Janson's inequality (3.4) to do this, with 6 = 4 ( q 2 +q3), yielding a bound of e x p { - n p [ 1 - y + y l o g y ] } for all t large enough that 4(q2(t) + q3(t)) < 1/2. Hence the Compound Poisson Estimate 2 gives a rate
drv(50(W), CP(2,/t)) = O(p + e -~"p)
(5.28)
for some ~ > 0, which is of the ideal order O(p) as soon as np is at all large, and which, combined with (5.27), shows that d ~ (5(W), CP(2, #)) = O(p log log(i/p)) as t --+ oc, under all circumstances. (5.29)
112
A.D. Barbour
Note that, in (5.27)-(5.29), the compound Poisson distribution CP(2,/a) can be replaced by CP(2',/,'), where #] = 1 - 4 ( q 2 ,+ q3) and #I = 4(q 2 + q3), and 2 ' = np(1 + 4 ( q 2 +q3))-1 is chosen to satisfy 2 m '1 = 2ml. This is because the difference involved in replacing e0 and el in the C o m p o u n d Poisson Estimates 1 and 2 by e~ and e'l as in (5.14) is only of order O(p), since, by the Bonferroni inequalities, [D[Ull and IP[U11 = 1[]11 = 1 ] - 4(q 2 ~_ q3)] _< 56p , from which it follows that dw(Q', Q) = O(p). The distribution ~' satisfies (5.17) for all q such that q2 + q3 _< 1/12. The setting of Example 5.2 can be seen as a simplification of that discussed in the Poisson process context in Example 4.4. C o m p o u n d Poisson approximation for the number of k-subsets covered by a square C of side c, when m points are distributed uniformly at random on a square of side 1, is shown in Barbour and M~msson (2000) to be accurate to o r d e r O(m-1)~ q- e-~2), with 2 = m(mc2) ~-1. The analogy is obtained by taking k = 4, and replacing m by t2q and c by 2t -2, in which notation this error is O(q k-1 + e ~np), for some c~ > 0, not quite as small as in (5.28). In this case, the bounds (5.18) cannot be used in general, since (5.17) is not satisfied. Further examples of Stein's method used in compound Poisson approximation are to be found in Eichelsbacher and Roos (1999) and in Erhardsson (1999); in the latter paper, Stein's method is combined with regenerative theory to give a very general treatment of compound Poisson approximation to the number of hits on a rare set in a M a r k o v chain.
~---
01]11 = I] -- {1 -- 4(q 2 + q3)} I < 28p
Acknowledgement
This work was 20-50686.97. partially supported by Schweizer Nationalfonds Grant
References
Aldous, D. J. (1989a). Stein's method in a two-dimensionalcoverageproblem. Statist. Probab. Lett. 8, 307-314. Aldous, D. J. (1989h). Probability Approximations via the Poisson Clumping Heuristic. Springer, New York. Alm, S. E. (1983). On the distribution of the scan statistic of a Poisson process. In Probability and Mathematical Statistics. Essays in honour of Carl-Gustav Esseen (Eds., A. Gut and L. Holst), pp. 1-10. Department of Mathematics, Uppsala University. Alon, N., P. Erd6s and J. H. Spencer (1992). The Probabilistic Method. Wiley, New York. Arratia, R., L. Goldstein and L. Gordon (1989). Two moments sufficefor Poisson approximations: the Chen-Stein method. Ann. Probab. 17, 9-25.
113
Arratia, R., L. Goldstein and L. Gordon (1990). Poisson approximation and the Che~Stein method. Statist. Sci. 5, 403-434. Arratia, R., L. Gordon and M. S. Waterman (1990). The Erd6~R6nyi law in distribution, for coin tossing and sequence matching. Ann. of Statist. 18, 539-570. Barbour, A. D. (1982). Poisson convergence and random graphs. Math. Proc. Camb. Philos. Soc. 92, 349-359. Barbour, A. D. (1988). Stein's method and Poisson process convergence. J. Appl. Probab. Special Volume, 25A, 175 184. Barbour, A. D. and T. C. Brown (1992a). The Stein Chen method, point processes and compensators. Ann. Probab. 20, 1504-1527. Barbour, A. D. and T. C. Brown (1992b). Stein's method and point process approximation. Stoc. Proc. Appl. 43, 9-31. Barbour, A. D. and T. C. Brown (1996). Approximate versions of Melamed's theorem. J. Appl. Probab. 33, 472-489. Barbour, A. D., T. C. Brown and A. Xia (1998). Point processes in time and Stein's method. Stochastics 65, 127-151. Barbour, A. D., L. H. Y. Chen and W.-L. Loh (1992). Compound Poisson approximation for nonnegative random variables via Stein's method. Ann. Probab. 20, 1843-1866. Barbour, A. D., O. Chryssaphinou and M. Roos (1996). Compound Poisson approximation in systems reliability. Nay. Res. Log. 43, 251~64. Barbonr, A. D. and G. K. Eagleson (1983). Poisson approximation for some statistics based on exchangeable trials. Adv. Appl. Probab. 15, 585-600. Barbour, A. D. and P. E. Greenwood (1993). Rates of Poisson approximation to finite range random fields. Ann. Appl. Probab. 3, 91-102. Barbour, A. D. and P. Hall (1984). On the rate of Polsson convergence. Math.- Proc. Camb. Philos. Soc. 95, 473-480. Barbour, A. D., L. Holst and S. Janson (1992). Poisson Approximation. Oxford University Press. Barbour, A. D. and M. Mgmsson (2000). Compound Poisson approximation and the clustering of random points. Adv. Appl. Probab. 32, 19-38. Barbour, A. D. and S. Utev (1998). Solving the Stein Equation in compound Poisson approximation. Adv. Appl. Probab. 30, 449-475. Barbour, A. D. and S. Utev (1999). Compound Poisson approximation in total variation. Stoch. Proc. Appl. 82, 89-125. Bollobfis, B. (1985). Random Graphs. Academic Press. Bonferroni, C. E. (1936). Teorie statistiche delle classie calcolo delle probabilitfi. Publ. Inst. Sup. So. Ec. Comm. Firenze, 8, 1-62. von Bortkewitsch, L. (1898). Das Gesetz der kleinen Zahlen. Teubner Verlag, Leipzig. Brown, T. C. (1983). Some Poisson approximations using compensators. Ann. Probab. 11, 726-744. Le Cam, L. (1960). An approximation theorem for the Poisson binomial distribution. Pac. J. Math. 10, 1181-1197. Cekanavi~ius, V. (1997). Asymptotic expansions in the exponent: a compound Poisson approach. Adv. Appl. Probab. 29, 374-387. Chert, L. H. Y. (1975a). Poisson approximation for dependent trials. Ann. Probab. 3, 534-545. Chen, L. H. Y. (1975b). An approximation theorem for sums of certain randomly selected indicators. Zeitschrift ff~'r Wahrscheinlichkeitstheorie und verwandte Gebiete, 33, 69-74. Chryssaphinou, O. and S. Papastavridis (1988). A limit theorem for the number of non-overlapping occurrences of a pattern in a sequence of independent trials. J. Appl. Probab. 25, 428-431. Deheuvels, P. and D. Pfeifer (1988). On a relationship between Uspensky's theorem and Poisson approximations. Ann. Inst. Statist. Math. 40, 671-681. Dembo, A. and S. Karlin (1992). Poisson approximations for r-scan processes. Ann. Appl. Probab. 2, 329 357. Eichelsbacher, P, and M. Roos (1999). Compound Poisson approximation for dissociated random variables via Stein's method. Comb. Probab. Comp. 8, 335-346.
114
A . D . Barbour
Erhardsson, T. (1999). Compound Poisson approximation for Markov chains. Ann. Probab. 27, 565-596. Freedman, D. (1974). The Poisson approximation for dependent events. Ann. Probab. 2, 256-269. Geske, M. X., A. P. Godbole, A. A. Schaffner, A. M. Skolnick and G. L. Wallstrom (1995). Compound Poisson approximations for word patterns under Markovian hypotheses. J. Appl. Probab. 32, 877-892. Glaz, J., J. Naus, M. Roos and S. Wallenstein (1994). Poisson approximations for distribution and moments of ordered m-spacings. J. Appl. Probab. 31A, 271-281. Godbole, A. P. (1993). Approximate reliabilities of m-consecntive-k-out-of-n: failure systems. Statist. Sin. 3, 321-328. Hald, A. (1990). A History o f Probability and Statistics and their Applications before 1750. Wiley, New York. Hodges, J. L. and L. Le Cam (1960). The Poisson approximation to the Poisson binomial distribution. Ann. Math. Statist. 31, 737-740. Hwang, H.-K. (1999). Asymptotics of Poisson approximation to random discrete distributions: An analytic approach. Adv. Appl. Probab. 31, 448-491. Janson, S. (1990). Poisson approximation for large deviations. Ran. Struct. Algo. 1, 221-230. Janson, S. (1998). New versions of Suen's correlation inequality. Ran. Struct. Algo. 13, 467-483. Karofiski, M. and A. Ruciflski (1987). Poisson convergence and semi-induced properties of random graphs. Math. Proc. Camb. Philos. Soc. 101, 291-300. Keilson, J. (1979). Markov Chain Models - Rarity and Exponentiality. Springer, New York. Kerstan, J. (1964). Verallgemeinerung eines Satzes von Prochorov und Le Cam. Zeitschrift f~ir Wahrscheinlichkeitstheorie und Verwandte Gebiete, 2, 173-179. Knox, G. (1964). Epidemiology of childhood leukaemia in Northumberland and Durham. Brit. J. Prey. Soc. Med. 18, 17 24. Kounias, S. (1995). Poisson approximation and Bonferroni bounds for the probability of the union of events. Int. J. Math. Statist. Sci. 4, 43-52. Mfinsson, M. (1999). Poisson approximation in connection with clustering of random points. Ann. Appl. Probab. 9, 465-492. McGinley, W. G. and R. Sibson (1975). Dissociated random variables. Math. Proc. Camb. Philos. Soc. 77, 185-188. Michel, R. (1988). An improved error bound for the compound Poisson approximation of a nearly homogeneous portfolio. A S T I N Bulletin, 17, 165-169. De Moivre, A. (1712). De Mensura Sortis, seu, de Probabilitate Eventuum in Ludis a Casu Fortuito Pendentibus. Philos. Trans. 27, 213-264. Neuhauser, C. (1994). A Poisson approximation theorem for sequence comparisons with insertions and deletions. Ann. Statist. 22, 1603-1629. Perelson, A. S. and F. W. Wiegel (1979). A calculation of the number of IgG molecules required per cell to fix complement. J. Theor. Bio. 79, 317-332. Poisson, S. D. (1837). Reeherches sur la probabilitd des jugements en matiOre criminelle et en matiOre civile, prdceddes des r~gles g~n~rales du calcul des probabiHt~s. Paris. Prohorov, Ju. V. (1953). Asymptotic behaviour of the binomial distribution. Uspekhi Mathematicheskikh Nauk, 83, 135 143. Reinert, G. and S. Schbath (1998). Compound Poisson approximation and Poisson process approximation and Poisson process approximations for occurrences of multiple words in Markov chains. J. Comp. Bio. 5, 223-253. Roos, B. (1999). On the rate of multivariate Poisson convergence. J. Multi Anal. 69, 120-134. Roos, M. (1994a). Stein's method for compound Poisson approximation: The local approach. Ann. Appl. Probab. 4, 1177-1187. Roos, M. (1994b). Stei~Chen method for compound Poisson approximation: the coupling approach. Probab. Theo. Math. Statist. Proceedings of the Sixth Vilnius Conference, 645-660. Roos, M. (1996). An extension of Janson's inequality. Rand. Struct. Algo. 8, 213~27.
115
Schbath, S. (1995). Compound Poisson approximation of word counts in DNA sequences. ESAIM: Probab. Statist. http://www.emath.fr/ps/, 1, 1-16. Serfling, R. J. (1975). A general Poisson approximation theorem. Ann. Probab. 3, 726--731. Shorgin, S. Y. A. (1977). Approximation of a generalized binomial distribution. The. Probab. Appl. 22, 846~850. Silverman, B. W. and T. C. Brown (1978). Short distances, flat triangles and Poisson limits. J. Appl. Probab. 15, 815-825. Smith, R. L. (1988). Extreme value theory for dependent sequences via the Stein-Chen method of Poisson approximation. Stoch. Proc. Appl. 30, 31%327. Spencer, J. H. (1995). Modern probabilistic methods in combinatorics. In Surreys in eombinatorics (Ed., P. Rowlinson), LMS Lecture Note Series 218, 215~231. Stein, C. (1970). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, 2, 583-602. Stein, C. (1986). Approximate computation of expectations. Institute of Mathematical Statistics Lecture Notes Monograph Series, Vol. 7. Hayward, California. Stein, C. (1992). A way of using auxiliary randomization. In Probability Theory (Eds., L. H. Y. Chen et al.), de Gruyter, Berlin. Such, W. C. S. (1990). A correlation inequality and its application to a Poisson limit theorem for nonoverlapping balance subgraphs of a random graph. Rand. Struct. & Algo. 1, 231-242. Uspensky, J. V. (1931). On Ch. Jordan's series for probability. Ann. Math. 32, 306-312.
Some Elements on L6vy Processes
Jean Bertoin
1. Introduction
L6vy processes, i.e., processes in continuous time with independent and stationary increments, are named after Paul L6vy who made the connection with infinitely divisible laws, characterized their distributions (L6vy-Khintchine formula) and described their structure (L6vy-It6 decomposition). Since the pioneer works in the early 30s, L6vy processes have been intensively studied both by theoretical and applied probabilists. From the theoretical point of view, they form an important class of Markov processes which can be efficiently studied by the combination of Fourier analysis, potential theory, and of course classical probabilistic methods. They are also natural examples of semimartingales for which stochastic calculus applies. From an applied point of view, they were first considered as models for storage processes, insurance risk, queues . . . . , see Prabhu (1981) and Bingham (1975). More recently, L6vy processes (or so-called L6vy flights) appeared in physics, in connection with problems in turbulence, laser cooling, ..., see Shlesinger et al. (1995). L6vy processes now also play an important role in mathematical finance, thanks to the fact that many infinitely divisible laws have heavy tails (in contrast with the Gaussian tails in models based on Brownian motion); see e.g., Barndorff-Nielsen (1998) and the references therein. The purpose of this text is to provide a brief survey of some of the most salient features of the theory. We will first present the fundamental aspects of L6vy processes (Markov property and infinite divisibility) and describe their probabilistic structure. Then we will dwell on techniques used to derive the distribution of certain important functionals of L6vy processes. The final section is devoted to some remarkable sample path properties such as transience or recurrence, information on the rate of growth, and certain geometric properties of the range. This covers only a small amount of what is known on the topic. We refer to Bertoin (1996, 1999), Fristedt (1974) and Sato (1999), and the references therein for a much more complete account, including proofs of the many results which are merely stated here.
117
118
J. Bertoin
2. Basic aspects of L6vy processes
Throughout this text, we will be dealing with a probability space f2 endowed with a probability measure P and a filtration (~t)t_>0 that fulfills the standard conditions. That is -~-0 (and a fortiori each Yt) is P-complete, and the filtration is rightcontinuous, i.e.,
~t=~-~t+:=
e>0
A~t+e .
Consider a random process X = (Xt, t _> 0) with values in E d that is ( ~ t ) adapted. One says that X has independent and stationary increments if for every t, s > 0, Xt+~ - Xt is independent of ~ t and has the same distribution as X~ (note that this forces X0 = 0 a.s.). Moreover, when its sample paths are right-continuous and possess limits to the left, one calls X a L~vy process. The purpose of this section is to present fundamental properties of L6vy processes, namely the Markov property and the infinite divisibility. We will then specify the structure of L6vy processes.
2.1. The Markov property

The independence and stationarity of the increments of a L6vy process can be viewed as a kind of Markov property; more precisely as a spatially homogeneous Markov property. Specifically, fix an arbitrary t > 0 and set Xsr = Xt+s for s _> 0. Then conditionally on Xt = x , the process X / = (X~,s > 0) is independent of o~t and its distribution is the same as that of the shifted process x + X = (x + X~,s 2 0). It is easy to see that this Markov property also holds for simple stopping times, i.e., stopping times that only take a finite number of values. Then, approaching a general stopping time with a decreasing sequence of simple stopping times and making use of the right-continuity of the sample paths, one obtains the strong Markov property. To give a formal statement, it is convenient to denote for every x E ~d by Px the distribution of the L~vy process shifted by x, namely x + X. PROPOSmON 1. (Markov property) Let T be an (~-t)-stopping time, T < oc a.s., and set XtI = Xr+t for every t > 0. Then conditionally on Xr = x, X I = (X[, t > 0) is independent of Y T and has the law Px. In practice, the Markov property is often applied to investigate distributions related to first passage times, TB = inf{t > 0 :Xt c B}, where B c_ Nd is a Borel set. It is easy to prove that TB is a stopping time when B is either open or closed (see Corollary 1.8 in Bertoin (1996)); this remains true when B is merely an analytic set, but the p r o o f requires a much more sophisticated argument. This can be viewed as the starting point of the deep connections between general potential theory and Markov processes; see Blumenthal and Getoor (1968), Hawkes (1979) and Chapter II in Bertoin (1996).
Some elements on Ldvy processes
119
We then introduce two families of linear operators that are naturally associated with the Markov property. First, the semigroup (Pt, t >_ 0) is defined by Ptf(x) = ~-x(f(Xt)) = ~-(f(Xt + x)), x C ~d ,
where f : Rd ~ R is a measurable function, which is bounded or nonnegative. Plainly, Pt is a convolution operator. Moreover the Markov property and the tower property of conditional expectations yield the semigroup property, namely ProPs = Pt+, , Let us investigate the regularity of the semigroup. PROPOSITION 2. (Feller property) The semigroup (Pt, t >_ 0) fulfills the Feller property. That is, if cg0 stands for the space of continuous real-valued functions on Ra that have limit 0 at infinity, then for every f c cg0, we have P t f c (go and
t--~O+
lim P t f = f
in the sense of uniform convergence. PROOF. First, as f is continuous and bounded, it follows from dominated convergence that Ptf(x) = 7-(f(Xt + x)) (xzf~) ~-(f(Xt + xo)) = Ptf(xo) The same argument shows that lim Ptf(x) = 0 ; fxl--+oo so that Ptf ~ :g0. Second, fix e > 0 and suppose for simplicity that [f(x) l < 1 for all x. We know that f is uniformly continuous, so we may find t / > 0 such that If(x) - f ( y ) [ < e/3 whenever I x - y] <_ ~/. On the other hand, Xt converges to 0 in probability as t -~ 0+, so there is t~ > 0 such that P(IXd > t/) <_ 5/3 It follows that for every t _< t~ I P t f ( x ) - f(x)[ _< E(If(Xt + x ) - f ( x ) D <_ PCIX,] <_ t/)~/3 + 2P(IXt [ >_ 17) < ~ , which shows that P t f converges uniformly to f as t --~ 0+. [] for every t < t~ .
120
J. Bertoin
We next turn our attention to a more stringent property. One says that an operator d defined on the domain L~(N d) has the strong Feller property if d f is a continuous function for every f E L~(Rd). PROPOSmON 3. (Strong Feller Property) For each fixed t > 0, the operator Pt fulfills the strong Feller property if and only if the law of Xt is absolutely continuous with respect to the Lebesgue measure on Rd PROOF. Assume that the law of Xt has a density with respect to the Lebesgue measure, P(Xt E dx) =pt(x) dx, so
Ptf(x) = ~ f(x + Y)Pt(Y)dy
As Pt E L 1(d_x) and the convolution with an integrable function maps Lo(Nd) to the space of continuous functions, the strong Feller property is proven. Conversely, suppose that the strong Feller property holds for Pt and consider a Borel set N with zero Lebesgue measure. Then write f = 1N and note by Tonelli theorem that
f~dPtf(x)dx=2(f~f(Xt+x) dx)
= j/df(x)dx .
Plainly the right-hand-side is zero since N is a null set. So Ptf(x) = 0 for almost every x ENd, and thus for all x ENd by continuity. Specifying this for x = 0 shows that P(Xt E N) = 0 for all sets N of zero Lebesgue measure; and by the R a d o n - N i k o d y m theorem, the distribution of Xt is absolutely continuous. [] We refer to Hawkes (1979) for a deeper study of the strong Feller property for L6vy processes. See also Sato (1999) and the references therein for results on L6vy processes with absolutely continuous semigroups. Second, the resolvent operators (uq, q > 0) are defined by
uqf(x) =
e-qtPtf(x) dt= ~
e-qy(xt + x)dt
. (Pt, t
> 0) easy
Again, uq is a convolution operator. The semigroup property of yields the resolvent equation
t y y - u T + (q - r ) v q ( u T )
= 0
for every q, r > 0. It is straightforward to see that uq maps cg0 into cg0. Hawkes (1979) established that Uq has the strong Feller property for some q > 0 if and only if the distribution of X~ is absolutely continuous with respect to the Lebesgue measure, where z stands for an independent random variable with an exponential distribution with parameter q. In that case, we say that the resolvent kernel is absolutely continuous, and more precisely, if P(X~ E dx) quq(x) dx, then
=
uqf (x) = JR[ Jf (x + y)uq (y) dy
Some elements' on Ldvy processes'
121
This condition of absolute continuity has an important role in the study of the cone of excessive functions for a L6vy process; see Hawkes (1979) or Section 1.3 in Bertoin (1996).
2.2. Infinitely divisible laws and the L~vy-Khintchine formula

The independence and stationarity property of the increments and the elementary decomposition
X~ =-X1//7 + (&/, - X l / , ) + . . . + (X,/, -X(,_1)/, )

for every integer n entail that the one-dimensional distributions of a L~vy process are infinitely divisible (i.e., can be expressed as the sum of n i.i.d, variables for every positive integer n). Conversely, it can be shown that any infinitely divisible distribution can be viewed as the law of some L6vy process evaluated at time 1. It is well-known that the characteristic function of a variable X1 which has an infinitely divisible law, can be expressed in the form ~(e i(Lx~/) = e ~(~), 2 E Rd ,
where <.,.) denotes the Euclidean scalar product in Nd and ~g : R~--+ C is a continuous function with T(0) = 0, which is known as the characteristic exponent of X. The fundamental result about infinitely divisible distributions is the L6vyKhintchine formula which gives the generic form of characteristic exponents. We will see in the next section that it also provides the key to understanding the probabilistic structure of L6vy processes.
THEOREM 4. (Levy-Khintchine formula) A function 7J: R d --+ C is the characteristic function of an infinitely divisible distribution if and only if it can be expressed in the form
7J(2)= i<a,2)+ ~ Q ( 2 ) + ~d ( 1 - ei<x'~> + i<),x>l{l~l<,})//(dx),

where a E R~, Q is a positive semi-definite quadratic form on Rd, a n d / 1 a measure on Nd\{0} with f(1 A ]xl2)H(cbc) < oc called the Lfivy measure. Moreover, a, Q a n d / 7 are then uniquely determined by ku. Let us briefly review some classical examples of infinitely divisible distributions on R and the corresponding L~vy processes. The Poisson law with parameter c > 0 is the distribution on N with mass function c'e-C/n!. Its characteristic function is
OO /7
122
J. Bertoin
so it is infinitely divisible with L6vy measure c61, where 3i stands for the Dirac point mass at 1. The associated L6vy process is the Poisson process with intensity c. The Gaussian law Y ( 0 , 1) has characteristic function
v~2~ f~eiJ~e-X2/2 dx = exp{-22/2} '

so it is infinitely divisible with L6vy measure zero; the associated L6vy process is the standard Brownian motion. The Cauehy law has characteristic function rc d-oo (1 ~ x 2) - exp{-12l} = exp o~(1 - eiZ~)x-2 dx .
So it is infinitely divisible with L~vy measure re-ix 2 dx; the associated L~vy process is the standard Cauchy process. The gamma law with parameter c > 0 has characteristic function 1 ei~xc-le-~ dx = (1 - i2) -c = exp - c (1 - ei~)e -~ .
It is infinitely divisible with L~vy measure cl{x>0}x le-X dx; the associated L6vy process is the gamma process. Strictly stable laws are extensions of Gaussian laws that appear in particular in limit theorems for sums of i.i.d, variables with infinite variance; see Zolotarev (1986) or Samorodnitsky and T a q q u (1994). They depend on three parameters, the most important of which is the index ~ E ]0, 2]. F o r c~ = 2, stable laws are Gaussian laws, and for c~ = 1, they are simple transforms of Cauchy laws. Thus we shall focus on the case ~ E ]0, 1[U]1,2[. The two other parameters are the skewness/~ C I-1, 1] and the scale ? > 0. The characteristic function of the stable (~,/~, ?) distribution is given by exp{-712l~(1 - ifl sgn(2) tan(~e/2))} . It can be proved that the L~vy measure/-/is absolutely continuous with respect to the Lebesgue measure, with density //(dx)--{ c + x -~-1 dx c JxI ~ - l d x i f x > 0, if x < 0
where c + and c - are two nonnegative real numbers such that fi = (c + - c - ) / (c + + c-). The associated L6vy process is called a stable L~vy process with index a and skewness ft. There are o f course m a n y other important infinitely divisible distributions, including log-gamma laws (see e.g., Shanbhag and Sreehari (1977)), hyperbolic
Someelementson Ldvyprocesses
123
and generalized inverse Gaussian laws (see Barndorff-Nielsen and Halgreen (1977) and Seshadri (1993)), self-decomposable distributions (see e.g., Shanbhag and Sreehari (1977), Sato (1999) and the references therein for important properties in that field), the so-called Bondesson distributions (cf. Bondesson (1981)), Student distributions (cf. Grosswald (1976)),.... Making use, once again, of the independence and stationarity of the increments, it is immediately seen that when t >_ 0 is a rational number, the characteristic exponent of Xt is simply t~. It then follows from the right-continuity of the sample paths that the same holds more generally for every real number. In other words, one has E(ei(2~r~)) = e-t~(2) )~ ~ [~d t
~0 .
The characteristic exponent thus specifies the law of Xt for each fixed t. By the independence and stationarity of the increments of X, the law of the L6vy process is therefore completely determined by ~. In particular, the semigroup and the resolvent operators can be evaluated using the characteristic exponent and the Fourier transform, which provides a very efficient analytic approach to their study. More precisely, let ~ ( g ) stand for the Fourier transform of an integrable function g,
=
eilX'%(x)dx,
~(Ptf)
=
The Fourier transform
of Ptf is then given by Ld ei(X'~)~(f(Xt+ x))dx
L,lei(xg)ptf(x)dx
= E(Ldei(~:,e-)f(Xt+x)dx)
= E ( L d ei~Vx'g)f(y)dy) = E (ei(-{N)) ( d ei(Y'{)f(y)d , ) . In conclusion, we obtain the simple expression ~(Ptf)() = e-t~(-g)~(f)({), g e ~d . (1)
Similarly, the Fourier transform of the resolvent is given by
J~(uqf)()
--
qJ~(f)({) + T(-)'
{ E Rd
(2)
As a first simple application of these identities, we observe that if the characteristic exponent satisfies N(T({)) = 0(1{I*) as I{I -+ oo for some e > O, then
t24
J. Bertoin
f [~l'le-t~'(-~)Jd# < oo
d
for every t > 0 and n E N .
By the Riemann-Lebesgue theorem, we see that the distribution of Xt is absolutely continuous with a density in cg~. In other words, the semigroup Pt is a convolution operator with respect to a probability density in (g~. In dimension d = 1, there are some important special cases where one can use the Laplace transform instead of the Fourier transform to study the distribution of a L6vy process, which is often very convenient. Specifically, it can be checked that for each fixed 2 > 0 f ( e 22:~) < oo ~ ~-~o,-][ e-Z~//(dx) < oc . (3)
In that case, it is seen from the L6vy-Khintchine formula that the characteristic exponent can be extended to the upper complex half-plane. We set for every 2 > 0 (2) = 7J(i2) = - a 2 - ~-2 Q 2 + f(1 - e -;~
- ~gGgl{lxkl})/~(dx ) .
One calls : [0, oo[---* N the Laplace exponent and one has for every t > 0
E(e
exp{-t
(;O}
Condition (3) holds in particular when the L6vy process takes values in [0, oo[ (i.e., when it has increasing paths). One then says that X is a subordinator; it can be checked that necessarily a _< 0, Q = 0 and H is a measure on ]0, oo[ with 0:~(1 A x)H(dx) < o~; and the Laplace exponent can be expressed in the simpler F e(;0 : d2 + l J]0
(1 - e-~)r/(a~)
where d > 0 is called the drift coefficient. See Section III.1 in Bertoin (1996) for the complete argument and Bertoin (1999) for many results on this important family of L6vy processes. More generally, it follows from (3) that the Laplace transform exists whenever the L6vy measure gives no mass to the negative half-line. In that case, X has no negative jumps (see the forthcoming Theorem 5) and one sometimes says that X is spectrally positive. We refer to Chapter VII in Bertoin (1996) for details.
2.3. Structure of LSvy processes

The purpose of this subsection is to specify the structure of L6vy processes, which essentially amounts to having a probabilistic understanding of the LSvy-Khintchine formula. The analysis relies on the study of the jump process of X.
Some elements on L~vy processes
125
AXt = Xt - Xt-
Let us denote by Xt_ = limpet_ X~ the left-limit of X at time t > 0, and by the (possible) jump: For any Borel set B C_ R d, write
Nf =
cara{s c ]0, t]:
B}
for the number of jumps accomplished by X before time t that take values in B. Provided that the distance of B to the origin is positive, this number is finite for every t > 0. The independence and stationarity of the increments of X entail that the counting process N B = (NB, t _> 0) has independent and stationary increments in the filtration ( ~ t ) ; note also that by construction its sample paths are nondecreasing and right-continuous and that they only increase by jumps of size 1. It is well-known that these properties characterize Poisson processes, i.e., N ~ is a Poisson process. On the other hand, if B 1 , . . . , B n , . . . are disjoint Borel sets, then the counting processes NB~,..., NB,,... are thus Poisson processes in the same filtration ( ~ t ) , and it is obvious that any two of them never jump at the same time. By a standard result on Poisson processes, this implies that N ~ , . . . , N B. are independent. Then denote the parameter of the Poisson process N ~ by A ( B ) , and consider a countable partition B 1 , . . . , B , , . . . of B (i.e., the Borel sets B1, . . . , B , , . . . are disjoint and their union is B). Then
N B =- N ~ ' + . . . + NB" + . . . ,
and since N B ~ , . . . , N B , , . . . are independent, the right-hand-side is a Poisson process with parameter A ( B 1 ) + . . . + A ( B , ) + . . . . This shows that A is a Borel measure on Rd - {0} that gives a finite mass to the complement of any neighborhood of the origin. Following a terminology introduced by It6, one says that the jump process of a L~vy process is a Poisson point process with characteristic measure A. To have a complete knowledge of the law of the jump process, all that is needed now is to identify the characteristic measure A, and this is the point where the L6v~Khintchine formula has a crucial role. THEOREM 5. (Structure of the jumps) The jump process A X = (AXt, t _> 0) of a L6vy process X is a Poisson point process valued in E d, whose characteristic measure is the L6vy measure/7. This means that for every Borel set B at a positive distance from the origin, the counting process N B is a Poisson process with intensity //(B), and to disjoint Borel sets correspond independent Poisson processes. We refer to Section 1.1 in Bertoin (1996) for a proof of this result; let us just examine the simple case of the so-called compound Poisson processes. Consider a finite measure A on Rd that gives no mass to the origin, and let A = ( A t , t >_ 0) be a Poisson point process with characteristic measure A. Because A has a finite mass, A has only finitely many points on every compact time-interval, a.s., and therefore the process Yz = ~0<~_<t As is well-defined. It is a right-continuous step process and by construction its jump process is A Y = A. It is immediate that Y is a L~vy process, and to specify its law, we just have to calculate its charac-
126
J. Bertoin
teristic function. By the well-known exponential formula for Poisson processes, one has ~:(ei('~'Yl))= ~ ( e x p {i0_~s~_<l(,~,As) } ) = e x p { - f~e ( 1 - ei(Lx))A(dx)}.
We recognize a special case of the L6vy-Khintchine formula; in particular, we can identify the characteristic measure A of the jump process as the L6vy measure. This simple example sheds light on the probabilistic meaning of the L~vyKhintchine formula and on the structure of L&y processes. The characteristic exponent 1//of a general L~vy process can be split into four parts,
1// = 1//(0) @ i//(1) _~_ 1//(2) _}_ 1//(3) ,
where 7"(0)(2) = i(a,2), 1//(2)(2) = ~ 7"(1)(2) = Q(2)
(1 - e i<x'x>)l{Id>l}H(dx)
7"(3)(2) = ~ (1 - e i(2"/+ i(2,x))l{ixl<l}//(dx )
Clearly, each 7"(i) is a characteristic exponent. More precisely, 7/(0) corresponds to a deterministic linear process (i.e., a constant drift), 1//(1) to the so-called Brownian component of the L6vy process, which is a linear transform of a d-dimensional Brownian motion, and we just saw that 1//(2) can be viewed as the characteristic exponent of a compound Poisson process with L6vy measure l{rxl>_l}II(dx ). In the case when
~ d [xll{Ix]<l}//(dx) < c~ ,
(4)
one can re-write 1//(3) as I//(3)(2) = i(2, a'} + f ~ (1 - ei(2#))l{rxl<l}H(dx) with al = f~e xl{rx[<l}//(dx) Then again one can use Poisson point processes to construct a L6vy process with characteristic exponent g*(3). Specifically, one first considers a Poisson point process A (3) valued in Nd with characteristic measure l{H<l)/7(dx); the hypothesis (4) ensures that the series ~0<j<j IA!3)[ converges a.s. for every t _> 0, and this enables us to set
Some elements on Lgvy processes
127
Yt(3)= - d t + Z
0<s<t
A~3)
Then standard arguments based on the exponential formula for Poisson processes show that y(3) is a L6vy process with characteristic exponent ku(3). This approach no longer works when the hypothesis (4) fails, because then the series ~o<~<t diverges a.s. for every t > 0. Nonetheless, one can still construct a L~vy process with exponent 7j(3) by approximation, using compensated sums of Poisson point processes. More precisely, we set for every t / > 0
LA~3)I
O<s<t
and then check that y(3,n) converges as q ~ 0+ to a L6vy process y(3) with characteristic exponent ~(3). Because the characteristic exponent of the sum of independent L6vy processes is just the sum of the characteristic exponents, it follows from our analysis that a general L6vy process can be decomposed as the sum of four independent L6vy processes, X = y(O) + y(l) + y(2) + y(3) where y(O) is a constant drift, y(1) is a linear transform of a Brownian motion (in particular, y(O) + y(1) is a continuous process), y(2) is a compound Poisson process with jumps of size greater than or equal to 1, and y(3) a pure jump process with jumps of size less than 1, that is obtained as the limit of compensated compound Poisson processes. This is known as the L6vy-It6 decomposition of L6vy processes.
3. Distribution of some functionals
The purpose of this section is to present various techniques to determine the distribution of some functionals of a L6vy process evaluated at certain random times. We will first consider integrals with respect to time of a function of the L6vy process, then the so-called age and residual lifetime processes in connection with the renewal theory in continuous time, and finally we will review some important identities arising from the fluctuation theory, which involves extrema of the L6vy process.
3.1. Integral functionals

We are concerned here with functionals of a L6vy process X that can be expressed in the form
o~f(Xs, s) ds
(5)
for some measurable function f .
128
J. Bertoin
Linear funetionals The simplest example in that field is f l Xs ds. Indeed, we can approach this integral by Riemann sums 1 n
=
~--~n-k+
k=l n
Then using the independence and stationarity of the increments, one gets that the Fourier transform of the latter quantity evaluated at 2 a Nd is
exp '--x
ylk=i
~(;(n- k + l)/n)
Taking the limit as n --+ oo gives
E(exp{ifl(2,Xs}ds}) -- exp{- fo~'t'(;t)dt} . E(exp{ifoa(;,X~}f(s)ds})= e x p { - f 0 ~ 7~(2y(t))dt}

where y(t) -- ~I f(~) d~.
,
(6)
More generally, a similar argument shows that if f(x, s) = xf(s) for some realvalued function f E L 1([0, 1]), one has (7)
Caleulation of moments The study of the law of integral functionals is much more delicate in the non-linear case, i.e., when the function f(.,s) in (5) is not linear. A first technique to tackle this problem is provided by the calculation of moments; see Fitzsimmons and Pitman (1999) for a recent survey. This applies in particular to exponential integrals of the type I =
/0
exp{-qs + i(2,Xs)}ds ,
where q > 0 and 2 ENd. An immediate variation of this technique also applies for Laplace transforms, provided that the L~vy process is real valued and has exponential moments (cf. Urbanik (1992, 1995), Carmona et al. (1997) and also Carmona et al. (1994) for an alternative approach; see also Paulsen (1993) for a closely related work). The latter has interesting applications in mathematical finance. Specifically, in order to calculate ~:(D) for an integer n > 1, we first re-write this quantity as a multiple integral (n!) Iz
(/0
dtl exp{-qtl i(2,Xt~)}
f
)
dt; exp{-qt2 + i(2,Xt~)} .
x ...
C
-1
dt, exp{-qt~ +i(2,Xt~)}

Next we express the last integral as
129
o~ dtn e x p { - q t , + i(J~,Xt.)}
tn-I
= exp{-qt,_l + i(2,Xt. ~)}
e x p { - q s + i(2,X~) }ds
where X~ = X~. ~+s - X t . 1. By the Markov property, X t is independent of ~t 1 and has the same law as X. Moreover, one has by Fubini's theorem
~_(foexp{-qs+i(2,X~)}ds)
=foe-qSe-S~'(~)ds
1
+ q
By induction, we obtain the simple formula E(H)
n!H" -
(8)
More generally, a closely related method yields a formula for the moments of integrals of the type
0 e-q'f(x,)
dt
in terms of f and the resolvent operator uq; see Eq. (3.4) in Fitzsimmons and Getoor (1992). However, in practice this formula becomes intractable when n is large, and enables the explicit determination of the law of the functional in only very few special cases. See Fitzsimmons and Getoor (1992, 1995), Getoor and Sharpe (1994) and Marchal (1998) for results to that end. Time substitution To study the distribution of increasing additive functionals of the type
Af(t) =
f(X~) ds ,
(9)
where f : Nd ___,[0, oc[ is a measurable function, it is sometimes interesting to consider first the inverse functional ~(t) = inf{s > O: Af(s) > t} and to use c~ as a time substitution. That is, one introduces the time-changed process
Zt
Xo:(t),
t >__0 .
By a standard Markov theory argument, Z = (Zt, t _> O) is Markovian. Lamperti has pointed out that in certain cases, this time substitution yields well-known
130
J. Bertoin
Markov processes. Specifically, in dimension d = 1, the case f(x) = C yields the so-called semi-stable Markov processes (see Lamperti (1972)), and the case f(x) = l{x>0}x ~ the so-called continuous state branching processes (see Lamperti (1967)). We refer to Carmona et al. (1994) for an application of the first case to the study of exponential functionals and briefly present an application of the second to the Stieltjes transform of subordinators (see Bertoin (1997) for the complete argument and further related results). Suppose X is a subordinator with Laplace exponent ~. The Stieltjes transform of X is defined as Y(a) =
/o
dt a+Xt'
a>0
Using the time-substitution of Lamperti, one can view 5Qa) as the time when the continuous state branching process Z associated to X explodes, The distribution of the latter is known explicitly, see Grey (1974), which enables us to describe the law of 5Qa) as follows. First, 5P(a) ~ oc a.s. for every a > 0 if f0~ d2/~b(2) = ec. Otherwise
P(S a > t) = ,
where y :]0, oc[--+]0, ~ [ is the inverse mapping of
J0 e(.)'
]0,
F e y n m a n - K a c formula The F e y n m a n - K a c formula is another useful tool to
obtain information about the law of additive functionals of the type (9), by providing a relation involving their Laplace transforms. It is closely related to the moments' method; see Fitzsimmons and Pitman (1999) for the connection and many interesting applications. Specifically, let g: Ra --+ [0, ec[ be another measurable function, and set for q>0andxENd
V~g(x) =
~X( fo ~e-qtg(Xt ) exp {-A f (t) } dt) .
Observe that if we can compute V;~l(x) for all 2,q> O, then we know ~x(exp{-)~Af(t)}) for all t, 2 > 0 by inverting the Laplace transform in the variable q, and thus we have determined the law ofAy(t). The function g is useful when one wishes to calculate conditional expectations given Xt. The F e y n m a n - K a c formula states that
where Uq is the q-resolvent operator of X. Indeed, an application of the Markov property shows that the left-hand side evaluated at x E ~d is given by
131
Ex( fo ~ dt e-qtf (Xt)~x, (.fo ~ ds e-qSg(X~) exp{-A f (s) } ) )
(f0 d,e
= EX(fo dse-qsg(X~) fo ~ dtexp~Ay(t)-Af(s)}f(Xt)) ~-EX(fo~dse-q~9(Xx)(1exp{-A~(s!}) ) -~,. We can rewrite the last displayed quantity as Uqg(x) - VTg(x), and (10) is proven. In certain cases, it turns out to be useful to combine the F e y n m a n - K a c formula with the Fourier transform. For instance, suppose X is real-valued and take f(x) = x 2. Recall that the Fourier transform of the q-resolvent is given by (2) and that for a nice function h, the Fourier transform of the product fh coincides with the negative of the second/ derivative of the Fourier transform of h. It follows that the Fourier transform ~ [ v~qgJ(~) = Y(4) solves a Sturm-Liouville equation with /, \ constant term
y"(~) = y(~)(q + 7~(-~)) - ~g(~) .

The interested reader is referred to Section V.2 in Bertoin (1996) for a striking application of this method to the Hilbert transform.
3.2. Renewal theory

Suppose in this subsection that X is a subordinator (i.e., an increasing L6vy process) but not a compound Poisson process, and denote the closure of its range by
.~ = {xt, t _> 0} c~
One often calls ~ a regenerative set. The renewal theory is concerned with the age and residual lifetime processes which are defined by At=inf{s>0:t-sC~) and
Rt=-inf{s>O:t+sE~)
These quantities can be expressed in terms of X and its first passage times Tt = inf{s >_ 0 :X~ > t} as follows:
At=t-Xr,-,
Rt=Xr,-t .
In words, t - At is the value taken by X immediately before exceeding the level t, and Rt + t the value taken by X at the instant when it first exceeds t. It is easy to check that both the age and the residual lifetime processes are Markovian.
132
J. Bertoin
We shall first express the joint distribution of At and Rt in terms of the L6vy measure H and the so-called renewal measure
U(dx) =
P(Xt E dx)dt .
Recall that the canonical decomposition of the increasing process X as the sum of its continuous part and its pure jump part is given by
Xt = d t ~ ~ AXs
s<t
(11)
where d _> 0 is the drift coefficient and A X = (AXe, s > 0) a Poisson point process with characteristic measure //. The joint distribution of the age and residual lifetime depend crucially on whether the drift coefficient is zero or positive; it is due to Neveu and Kesten, cf. Kesten (1969). THEOREM 6. (Distribution of the age and residuallifetime) L e t f : [0, oc[ x [0, ec[ [0, eel be a measurable function that is zero on the axes, i.e., f ( 0 , - ) = f(-, 0) ~ 0. We have for every t > 0
~ ( f ( A , Rt)) = fro ,,[ U(dx)~) -x,oo[ F l ( d y ) f ( t - x , y - t + x )

If d ----0, then
P(At = O or Rt = O) = O .
If d > 0, then the renewal measure is absolutely continuous and possesses a continuous everywhere positive density u : [0, o c [ ~ ] 0 , ec[ with u(0) = 1/d and moreover
P(At = O) =- P(Rt = O) = P(At = Rt = O) = u(t)/u(O) .

The first part of the statement follows easily from the decomposition (11), the structure of the jump process described in Theorem 5 and the compensation formula for Poisson processes. The second part is much more difficult; we refer to Section III.2 in Bertoin (1996) for details. Two important applications of Theorem 6 are the following classical limit theorems. The first specifies in particular the stationary distribution of the age and residual lifetime processes. THEOREM 7. (Renewal theorem) Suppose that X has finite expectation,
~ ( X l ) = d -~
/0
]~(]x, cx3[)dx = d -~-
j0
x f ] ( d x ) :---- ~ / E ]0, 0(3[ .
,oc[
Then the pair (At,Rt) converges in distribution as t ~ o c to the pair (VZ, ( 1 - V)Z) where the variables V and Z are independent, V is uniformly distributed on [0, 1] and
Some elements on Ldvy process'es
133
,
P(Z E dz) = #-I(d~0(dz ) ~- z//(dz)),
z~ 0
where 60 stands for the Dirac point mass at 0. In particular, the probability measure
+ n(jx, ooDd )
on [0, oc[ is the stationary law for both the age and the residual lifetime processes. The second limit theorem determines the asymptotic behavior in distribution of the age and residual lifetime processes in the non-stationary case. THEOREM 8. (Dynkin-Lamperti Theorem) Suppose that ~(At) --" at as t ---+oo for somee ~10,1[.Thenfor0<x< 1 andy>0, onehas
In particular
and
sin>/l+,/, 'd,
We refer to section 3 in Bertoin (1999) for a proof, and results on the pathwise asymptotic behavior of the age process.
3.3. Fluctuation theory

Fluctuation theory for L6vy processes is the version for continuous time of that developed for random walks by Spitzer, Feller, Borovkov and others. It is concerned with the behavior of a real-valued L6vy process jointly with its supremum S t = supXs,
O<s<_t
t_>0 .
One of the main results in that field is given by the fluctuation identities. They specify the joint distribution ofS~, X~, 9~ and z, where ~ is a random time which is independent of X and follows an exponential distribution, say with parameter q > 0, and 9~ the largest location before r where the L6vy process reaches its supremum, 9~=sup{t<~:Xt=St}
For the sake of brevity, we shall merely present some key results and refer to Chapter VI in Bertoin (1996) for proofs of the statements and applications of this important theory which relies heavily on the observation that the so-called reflected process S - X has the strong Markov property.
134
J. Bertoin
THEOREM 9. (Fluctuation identities)
(i) The pairs (g~, S~) and (z - 9~, S~ - X~) are independent. (ii) For every co,fi > 0, [(exp{-c~9~ -
fiS~})
ex (/0
and IZ(exp{-e(z - g~) -
1, 'e
fl(S~ X~)})
ex (j0 d'r
,,, 'e q
The identities in (ii) are L~vy-Khintchine formulas. The feature that the variables S~ and S ~ - X~ have infinitely divisible distributions and the obvious decomposition X~ = S ~ - (S~-X~) as the difference of two independent nonnegative variables yield the noteworthy factorization q ~ _ ~q + tpq q~(12)
+ (respectively, 7Jq) is the characteristic function of the infinitely divisible where ~q nonnegative variable S~ (respectively, nonpositive variable X~ - S0. One refers to (12) as Wiener-Hopf faetorization of the L6vy process. Fluctuation identities can be viewed as a remarkable and sophisticated application of Markovian techniques (and in particular excursion theory); see Section VI.2 in Bertoin (1996). We stress that combinatorial methods also yield important contributions to fluctuation theory; here are the main two which are due to Sparre Andersen and Wendel, respectively. PROPOSmON 10. (Combinatorial fluctuation identities) (i) For each fixed t > 0, the variables sup{s < t : S~ = Xt} have the same distribution. (ii) For each fixed s c ]0, 1[, the variables and
/0 '
l{x,>0} ds
inf{X:foll{x,<_x}dt>s}
have the same distribution.
and
O<t<_s
sup Xt +
inf (Xt+~ - X~)

0_<t_<l s
135
There exist many more interesting identities in that field; see for instance Doney (1993) and Exercises VI.1 and VI.8 in Bertoin (1996). Renewal theory, which has been briefly introduced in the preceding subsection, has also an important role in fluctuation theory. More precisely, the so-called ladder time set, which consists of times when the L6vy process coincides with its supremum, can be viewed as the range of some subordinator. Analogously, the ladder height set, which is defined as the set of the values taken by the supremum of the L~vy process, is again the range of another subordinator. These features yield, in particular, important limit theorems and can be used to derive interesting information on the path behavior of X; see Sections VI.3-5 in Bertoin (1996). Roughly, the identities of fluctuation theory for general L~vy processes describe nice connections between certain random variables or processes. Unfortunately, they are usually too involved to give a completely explicit description of the laws of variables one is interested in. In particular, the factors of the Wiener-Hopf identity (12) are not explicitly known in general. Nonetheless, there are at least two important special cases for which this can be done: first when the L~vy process is completely asymmetric (i.e., it has either only positive jumps, or only negative jumps), and second when the L6vy process is stable (thanks to the scaling property, cf. Doney (1987)). For instance, if X has only negative jumps, its Laplace transform can be expressed in the form ~:(eqX') = exp(t(q)),
t,q > 0 ,
for some convex function : [o, ocE--+ R which is known as the Laplace exponent. Except in the trivial case when - X is a subordinator, which we implicitly exclude, it holds that limq__+o~ ( q ) = oc. We denote by 4~ : [0, oc[--+ [0, eo[ the continuous inverse function of , i.e., 45(0(q)) = q for all q > 0. Then it can be shown that the first passage process of X, Tt = inf{s : X~ > t} is a subordinator with Laplace exponent 4~, and the fluctuation identities in Theorem 9 are given in the much more explicit form ~(q) F(exp{-eg~ -/~S~}) - ~(e + q) +/~ , q(~b(c~ + q) fi) ~(exp{-c~(z - g~) -/?(S~ - X~)}) = q~(q~-+-q~ ~b~)) Many natural problems can also be solved in these special situations, though no solutions are known in the general case. One particularly interesting example is the so-called two-sided exit problem which consists of determining distributions related to the first exit-time of the L6vy process from a finite interval. See Chapters VII and VIII in Bertoin (1996) and the references therein.
4. Some sample path properties

In this chapter, we shall review some important path properties of L~vy processes. We will first discuss, qualitatively, the asymptotic behavior as time goes to
136
J. Bertoin
infinity, then turn our attention to the local rate of growth at a given (possibly random) time. Finally, we describe some important geometric properties of the range of a L6vy process.
4.1. Recurrence and transience

The notions of transience and recurrence provide a classical connection between the probabilistic theory of M a r k o v processes and the abstract potential theory. As we will see here, the connection is especially simple for L6vy processes; and the combination of probabilistic and analytic arguments makes the theory remarkably powerful. The interested reader is referred to Sections 1.4 and VI.3 in Bertoin (1996) and to the crucial paper of Port and Stone (1971) for the complete proofs of the results stated in this subsection. Consider again a L6vy process X with values in R d. F r o m the probabilistic point of view, one says that X is recurrent if liminf]Xt] = 0 a.s.
t--+OO
and that
X is transient if lira IXtl = ec a.s.
t~OO
F r o m the point of view of potential theory, one first introduces the potential of a Borel set B _C R a as the expected time spent by the L6vy process in B,
U(B) =
P(Xt E B)dt = ~
l{N~e} dt
The analytic characterization of transience and recurrence is given by the following. PROPOSITION 11. For ~ > 0, let Be stand for the open ball in Nd centered at the origin with radius e. If U(B~) < ec for some e > 0, then U is a R a d o n measure and the L6vy process is transient. Otherwise the L6vy process is recurrent. PROOF. Suppose that U(Be) < oc for some e > 0, pick an arbitrary x c E d and write T = inf{t _> 0 : IX~ - xl < e/2} for the first entrance time of the L6vy process into the open ball centered at x with radius e/2. By the right-continuity of the paths, we have IXr - x] _< e/2 provided that T < oo, and it follows that the total time spent by X in the ball x + Be~2 is bounded from above by the total time spent by Xt~ = Xr+t - Xr in Be. We deduce from the M a r k o v property that
U(x + B42 ) <_ U(B~) .

Next pick an arbitrary compact set K c Na. One can cover K with a finite number of balls with radius e/2 centered say at x i , . . . , x , E R d. We deduce that U(K) <_ nU(Be) < ec, and thus the potential measure is a R a d o n measure.
Some elements on LOvy processes
137
We then check that X must be transient. Fix an arbitrarily large r > 0, write $1 = inf{t > 0 : IXtl > 2r} for the first exit-time from B2r, 1"1 ~--- inf{t > $1 : IXtl < r} for the first return time to B~ after $1, $2 for the first exit-time from B2r after T1, T2 for the first return time to B~ after $2,.... It will be convenient to use the convention that oc - ec = 0 in the sequel. For a fixed index n, write X t = Xro+t - X r ~ provided that T~ < oo, and note that since IXr, I <_ r, the first exit time of X' from B~ occurs no later than S~+I - Tn. We deduce from the strong M a r k o v property that
E(S,+I - T,) >_ cP(Tn < ec)
(13)
where c > 0 is the expected value of the first exit time of X from B~. On the other hand, the total time spent by X in B2r is bounded from below by S1 + ($2 - T1) + ($3 - T2) + . . . . If X was not transient, the probability that T~ < ec would be bounded from below by a positive constant, and it would follow from (13) that ~ ( S 1 -}- ( S 2 - T1) ~- ( S 3 - T2) + . . . ) = 00. Thus we would have U(B2~) = oc which is impossible as U is a R a d o n measure. Finally, we refer to Section 1.4 in Bertoin (1996) for the p r o o f of the assertion t h a t X is recurrent if U(B~) = cc for all ~ > 0. [] We next state a deep result due to Spitzer, Port and Stone, which specifies whether a L6vy process is transient or recurrent in terms of its characteristic exponent. THEOREM 12. (Criterion for recurrence or transience) If the integral
converges for some r > 0, then the L6vy process is transient. Otherwise, it is recurrent. The direct part of the statement (if the integral converges then X is transient) follows easily from the Fourier analysis of the potential measure. The converse is a much harder result; we refer to Port and Stone (1971) for the complete argument. For instance, it follows from Theorem 12 that a real-valued stable process with index ~ is transient for ~ < 1 and recurrent for c~ > 1. In a different direction, a 'truly' d-dimensional L~vy process (in the sense that X does not stay on a subspace of dimension d - 1) is transient whenever d > 3. In dimension d = 1, the following handy criterion is available (see Exercise I. 10 in Bertoin (1996) for hints of the proof). PROeOSIXION 13. (Chung and Fuchs test) Let X be a real-valued L6vy process with finite mean E(X1) = # E R. Then X is transient if # # 0 and recurrent if /~=0. Still in dimension d = 1, a finer analysis of the asymptotic path behavior of L6vy processes is known. One says that the L6vy process drifts to oc if
138
J. Bertoin
limt+oo Yt = oc a.s., drifts to - c o if limt_,ooXt = - o c a.s., and oscillates if lim supt__+ooXt = oc and lira inft-+oo Xt = - o c a.s. N o t e that a transient L6vy process m a y oscillate. THEOREM 14. (Spitzer-Rogozin's criterion) I f the integral
~ l
' ~ [P(Xt _< 0)t 1 dt
converges, then X drifts to oc. I f the integral oo P(Xt > 0)t -1 dt
converges, then X drifts to - c o . Finally, if none of the integrals above converges, then X oscillates. We refer to Section VI.3 in Bertoin (1996) for a proof. F o r instance, one can easily check that stable L & y processes oscillate, except when X or - X is a subordinator. A possible d r a w b a c k of the S p i t z e r - R o g o z i n criterion is that it is given in terms of the probabilities P(Xt _> 0) which are usually not k n o w n explicitly. In the case when the m e a n # E I - c o , oo] of X1 is well-defined, the strong law of large numbers and the test of C h u n g and Fuchs entail that X oscillates if/z = 0, drifts to eo if # > 0, and drifts to - o c if/~ < 0. Erickson (1973) has been able to treat the remaining case when the m e a n does not exist. PROPOSITmN 15. (Erickson's test) Suppose that F(X +) = I:(X~ ) = oc and set for every x > 0
H+(x)=H(]x, oo[),
I f the integral
H-(x)=II(]-cxz,-xD,
I(x)=
H+(y)dy .
j /
~ ~+(x) d (x/I- (x))
converges then X drifts to - e c , and if the integral
oo_H_(x) d(x/I +(x))
converges then X drifts to oo. Finally, if both integrals diverge, then X oscillates.
4.2. Sample path regularity

We n o w turn our attention to the regularity of the sample paths, in the sense of H61der continuity. M o r e precisely, recall that for ~ > 0, a function f : [0, oc[---+ R e is said c~-H61der continuous at t > 0 if there exists a polynomial P such that
Some elements on L~vy processes If(t + h) - P(h)l = O(Ihl ~)
139
as h ---, 0 .
It is easy to prove that when the L6vy process has a non trivial Brownian component in the sense of Section 2.3, this Brownian component governs the local regularity, i.e., at each fixed t > 0, the path of the L6vy process is e-H61der continuous at t > 0 with probability one if ~ < 1/2 and zero if e _> 1/2. See e.g., Theorem IV.6 in Gihman and Skorohod (1975). So, let us henceforth focus on the case when the L6vy process has no Brownian component. One of the simplest and most useful result about the H61der regularity of its sample paths has been proven by Blumenthal and Getoor 1961); the statement below is a refinement observed by Jaffard (1999). THEOREM 16. (H61der continuity) Suppose that the L6vy process X has no Brownian component, and introduce the so-called upper-index 13= inf{~ > 0 : r-~0+lim r ~ H { y : ]y] > r } - 0 } .
Then /3 c [0, 2]; and for each fixed t > 0, the path X is e-H61der continuous at t > 0 with probability one whenever 1/c~ >/3. Conversely, if 1/~ </3, then with probability one there exist no t > 0 at which X is c~-H61der continuous. PROOF. We shall just prove the second assertion and refer to Blumenthal and Getoor (1961) and Pruitt (1981) for the first one. The argument relies on the following easy fact: i f f : [0, oc[--+ Rd is a rightcontinuous function having limits to the left, then f cannot be e-H61der continuous at t > 0 whenever c~ > liminf lg I A f ( t - h)l h~0+ logh ' where A f ( s ) = f ( s ) - f ( s - ) stands for the jump o f f at s. So let 1/ct </? and recall from the L6vy-It6 decomposition that the jump process A = (At, t > 0) is a Poisson point process with characteristic measure/7. Fix an arbitrarily small e > 0 and consider the Poisson point process IAI1/~ A e. It is immediate that its characteristic measure v~ is given by v~(]r, oc{) -0 H ( { y : [Yl > F1/c~}) ifr>_~, if 0 < r < e .
The assumption that 1/c~ is less than the upper-index/3 ensures that exp
v~(lr , ocDdr dt = oc .
(14)
Next associate to every jump time s of A the open interval Is, (IA,I1/~ A a)+s[. According to a result due to Shepp (see e.g., Theorem 7.2 in Bertoin (1999)), (14)
140
J. Bertoin
ensures that for every a > 0, the random intervals Is, (IAsl U~ A a) + s [ cover ]0, oo[ with probability one. As an easy consequence, it holds for all t > 0 that > liminf lg IAt-hl h-~0+ log h ' and therefore X cannot be c~-H61der continuous at t. Of course, when 1/a < fi, the event with probability one []
At = {X is a-H61der continuous at t}
depends on the real number t, and it can be shown that there exists exceptional random instants at which the path is less regular than at a fixed time. More precisely, Jaffard (1999) has been able to compute the so-called spectrum of singularities of the sample paths of fairly general L6vy processes. Roughly, his result entails that for every c~ < 1/fl, the Hausdorff dimension of {t > 0 : X is not ~'-H61der continuous at t for any ~' > ~} is ~fl. It is also natural to consider the instants at which the path of X is the more regular possible. Only few results are known in this field; perhaps the most interesting is the following which is due to Perkins (1983). THEOREM 17. (Slow points for stable processes) Suppose X is a stable process with index c~ c ]0, 2]. Then there is a positive and finite number k such that inflimsup ]hl V~']Xt+h - Xtl = k,
t>0 h-~0
a.s.
It is easy to check that for each fixed t > 0, limh-~0+ h-V~lXt+h --Xtl = ~ a.s. (see e.g., Theorem VIII.5 in Bertoin (1996)), so Theorem 17 points out the existence of exceptional times, called slow points, at which the stable process grows more slowly than usual. The problem of calculating the Hausdorff dimension of slow points of a stable process seems still open, except of course in the Brownian case (see Perkins (1983)). There is a great variety of further results describing the regularity (or the irregularity) of the paths of L6vy processes, such as the rate of escape, p-variation, analogues of Chung's law of the iterated logarithm .... We refer to Bertoin (1996), Fristedt (1974), Pruitt (1981) and Taylor (1973), and the references therein for many more properties in that field.
4.3. Some geometric properties of the range

This section is devoted to the presentation of a few important geometric properties of the range of a L@y process valued in R d,
= {x,,t > 0} .
141
A first natural problem is to evaluate the size of this random set, that is, to calculate its fractal dimensions. The Hausdorff and packing dimensions have been determined by Pruitt (1969) and Taylor (1986), respectively. THEOREM 18. (Fractal dimensions of the range) Introduce the indices 7 = sup : > 0: lim r -~
r-+O+
7'=sup
{ {
e>0:liminfr
r-~0+
-~
/0' /o'
P(]Xt[ < r ) d t = 0
} }
.
P(IXtl<r)dt=0
Then 0 < 7 -< 71 -< 2 and with probability one, the Hausdorff dimension of ~ is 7 and its packing dimension 7/. [] Again, there is a great variety of results in this field, and Theorem 18 just provides two typical examples. We refer in particular to Fristedt and Pruitt (1971) for much more precise information on the exact Hausdorff measure of the range of subordinators and to Kahane (1985) for the case of stable L6vy processes. Another natural question, that lies at the heart of the deep connection between potential theory and Markov processes, is to characterize the deterministic sets that intersect L More precisely, call a Borel set B C_ Rd polar for the L6vy process X if Px(B N ~ = 0) = 1 for every starting point x ENd. Hawkes (1979) has given the following answer. THEOREM 19. (Characterization of polar sets) Suppose that X has absolutely continuous resolvent. A Borel set B C_ Rd is polar for X if and only if jfd 3~(1 + 7p(2)-) 1~-#(2) d 2 = o c for every probability measure # on B, where ~ # stands for the Fourier transform of #. More tractable criteria are available in special cases. For instance, if X is a stable process in Re with index ~ < d, then a Borel set B is polar for X if and only if it has zero Riesz capacity of order (d - ~). By an application of Frostman's theorem (cf. Theorem 10.2 in Kahane (1985)), this entails that Borel sets with Hausdorff dimension less than d - e are polar for X, whereas those with Hausdorff dimension greater than d - ~ are not. Last but not least, it is also interesting to investigate the self-intersection of the range of a L6vy process. For a fixed integer k _> 2 one says that the range possesses a point of multiplicity k if there are 0 < tl < " - < tk such that Xa . . . . . Xtk. In the Brownian case, it is well known from the work of Dvoretzky, Erd6s and Kakutani that the planar Brownian has multiple points of arbitrary multiplicity, Brownian motion in the 3-dimensional space has double points but no triple points, and finally that in dimension 4 and higher, the range
12
142
J. Bertoin
o f a B r o w n i a n m o t i o n has no m u l t i p l e points. T h e following r e m a r k a b l e result for L6vy processes has been c o n j e c t u r e d b y H e n d r i c k s a n d T a y l o r , a n d p r o v e d b y Evans (1987) a n d F i t z s i m m o n s a n d Salisbury (1989). THEOREM 20. ( M u l t i p l e p o i n t s in the range) A s s u m e t h a t the l i m i n f eSSx_+0ul(x) > 0 a n d t h a t the 1-resolvent is a b s o l u t e l y c o n t i n u o u s with density ul. T h e n the range o f X possesses p o i n t s o f multiplicity k _> 2 if a n d only if
I<1
(.l(x))k <
Q
The o d d - l o o k i n g h y p o t h e s i s l i m i n f eSSx-+0Ul(x) > 0 essentially ensures t h a t no p r o j e c t i o n o f X o n a linear subspace is a s u b o r d i n a t o r (otherwise it is clear t h a t the range c a n n o t have m u l t i p l e points). F o r instance, we see t h a t a truly d - d i m e n s i o n a l stable process with index ~ has p o i n t s o f multiplicity k if a n d only if kc~ > (k - 1)d.
References
Barndorff-Nielsen, O. (1998). Probability and statistics: self-decomposability, finance and turbulence. In Probability towards 2000 (Eds., L. Accardi and C. C. Heyde), Lecture Notes in Statistics 120, 47-57. Barndorff-Nielsen, O. and C. Halgreen (1977). Infinite divisibility of the hyperbolic and generalized inverse Gaussian distributions. Z. Wahrscheinlichkeitstheorie verw. Gebiete 38, 309--311. Bertoin, J. (1996). Ldvy Processes. Cambridge University Press, Cambridge. Bertoin, J. (1997). Cauchy's principal value of local times of L6vy processes with no negative jumps via continuous branching processes. Electronic Journal of Probability 2, http://mm.matla. washington.edu/~ ej pe cp/Ej pVol2/paper6, abs. html Bertoin, J. (1999). Subordinators: examples and applications. Ecole d'6t6 de probabilit& de St Flour XXVII. Lecture Notes in Maths 1717, Springer, Berlin. Bingham, N. H. (1975). Fluctuation theory in continuous time. Adv. Appl. Probab. 7, 705-766. Blumenthal, R. M. and R. K. Getoor (1961). Sample functions of stochastic processes with stationary independent increments. J. Math. Mech. 10, 493 516. Blumenthal, R. M. and R. K. Getoor (1968). Markov processes and potential theory. Academic Press, New-York. Bondesson, L. (1981). Classes of infinitely divisible distributions and densities. Z. Wahrscheinlichkeitstheorie verw. Gebiete 57, 39-71 [Correction and addendum Ib. 59, 277]. Carmona, P., F. Petit and M. Yor (1994). Sur les fonctionnelles exponentieltes de certains processus de L~vy. Stochastics and Stochastics Reports 47, 71 101. Carmona, P., F. Petit and M. Yor (1997). On the distribution and asymptotic results for exponential functionats of L~vy processes. In Exponential functionals and principal values related to Brownian motion (Ed., M. Yor). Biblioteca de la Revista Matem/ttica Iberoamericana. Doney, R. A. (1987). On the Wiener-Hopf factorisation and the distribution of extrema for certain stable processes. Ann. Probab. 15, I352-1362. Doney, R. A. (1993). A path decomposition for L~vy processes. Stoch. Process. Appl. 4"7, 167-181. Erickson, K. B. (1973). The strong law of large numbers when the mean is undefined. Trans. Amer. Math. Soc. 185, 371-381. Evans, S. N. (1987). Multiple points in the sample path of a L~vy process. Probab. Theory Relat. Fields 76, 359-367.
143
Fitzsimmons, P. J. and R. K. Getoor (1992). On the distribution of the Hilbert transform of the local time of a symmetric L+vy process. Ann. Probab. 20, 1484-1497. Fitzsimmons, P. J. and R. K. Getoor (1995). Occupation time distributions for L6vy bridges and excursions. Stoch. Process. Appl. 58, 73-89. Fitzsimmons, P. J. and J. Pitman (1999). Kac's moment formula and the Feynman~Kac formula for additive functionals. Stoch. Process. Appl. 79, 117-134. Fitzsimmons, P. J. and T. S. Salisbury (1989). Capacity and energy for mulfiparameter Markov processes. Ann. Inst. Henri Poincard 25, 325-350. Fristedt, B. E. (1974). Sample functions of stochastic processes with stationary, independent increments. In Advances in Probability 3, 241-396. Dekker, New-York. Fristedt, B. E. and W. E. Pruitt (1971). Lower functions for increasing random walks and subordinators. Z. Wahrscheinlichkeitstheorie verw. Gebiete 18, 167-182. Getoor, R. K. and M. J. Sharpe (1994). On the arc sine laws for L~vy processes. J. Appl. Prob. 31, 76-89. Gihman, I. I. and A. V. Skorohod (1975). The theory o f random processes 2. Springer, Berlin. Grey, D. R. (1974). Asymptotic behaviour of continuous time continuous state-space branching processes. J. Appl. Probab. 11, 669-677. Grosswald, E. (1976). The Student t-distribution of any degree of freedom is infinitely divisible. Z. Wahrscheinlichkeitstheorie verw. Gebiete 36, 103-109. Hawkes, J. (1979). Potential theory of L6vy processes. Proe. London Math. Soc. 38, 335-352. Jaffard, S. (1999). The multifractal nature of L6vy processes. Probab. Theory Relat. Fields 114, 207-227. Kahane, J. P. (1985). Some random series of functions, Second edition. Cambridge University Press, Cambridge. Kahane, J. P. (1985). Ensembles al6atoires et dimensions. In Recent progress in Fourier analysis (Eds., I. Peral and J. L. Rubio de Francia). North-Holland, Amsterdam. Kesten, H. (1969). Hitting probabilities of single points for processes with stationary independent increments. Memoirs Amer. Math. Soc. 93. Lamperti, J. W. (1967). Continuous state branching processes. Bull. A M S 73, 382-386. Lamperti, J. W. (1972). Semistable Markov processes. Z. Wahrscheinlichkeitstheorie verw. Gebiete 22, 205-225. Marchal, P. (1998). Distribution of the occupation time for a L~vy process at passage times at 0. Stoeh. Process. Appl. 74, 123 131. Paulsen, J. (1993). Risk theory in a stochastic economic environment. Stoch. Process. Appl. 46, 327-361. Perkins, E. (1983). On the Hausdorff dimension of the Brownian slow points. Z. Wahrscheinlichkeitstheorie verw. Gebiete 64, 369-399. Port, S. C. and C. J. Stone (1971). Infinitely divisible processes and their potential theory I and II. Ann. Inst. Fourier 21-2, 157-275; 21-4, 179-265. Prabhu, N, U. (1981). Stochastic storage processes, queues, insurance risk and dams. Springer, Berlin. Pruitt, W. E. (1969). The Hansdorff dimension of the range of a process with stationary independent increments. J. Math. Mech. 19, 371-378. Pruitt, W. E. (1981). The growth of random walks and L6vy processes. Ann. Probab. 9, 948-956. Samorodnitsky, G. and M. S. Taqqu (1994). Stable non-Gaussian random processes: stochastic models with infinite variance. Chapman and Hall, London. Shanbhag, D. N. and M. Sreehari (1977). On certain self-decomposable distributions. Z. Wahrscheinlichkeitstheorie verw. Gebiete 38, 217-222. Sato, K. (1999). Ldvy processes and infinitely divisible distributions. Cambridge University Press, Cambridge. Seshadri, V. (1993). The inverse Gaussian distribution. Statistical theory and applications. Lecture Notes in Statistics 137. Springer-Verlag, New York. Shlesinger, M. F., G. M. Zaslavsky and U. Frisch (Eds.) (1995). Ldvy flights and related topics in physics. Lecture Notes in Physics 450.
144
J. Bertoin
Taylor, S. J. (1973). Sample path properties of processes with stationary independent increments. In Stochastic Analysis (Eds., D. G. Kendall and E. F. Harding), pp. 387-414. Wiley, London. Taylor, S. J. (1986). The use of packing measure in the analysis of random sets. In Stochastic Processes and their Applications (Eds., K. It6 and T. Hida), Proceedings Nagoya 1985, Lecture Notes in Maths 1203, 214--222. Springer, Berlin. Urbanik, K. (1992). Functionals on transient stochastic processes with independent increments. Studia Math. 1(13, 299-315. Urbanik, K. (1995). Infinite divisibility of some functionals on stochastic processes. Probab. Math. Stat. 15, 493-513. Zolotarev, V. M. (1986). One-dimensional stable distributions. Amer. Math. Soc., Providence.
D. N. Shanbhag and C. R. Rao, eds., Handbookof Statistics, Vol. 19 2001 Elsevier Science B.V. All rights reserved.
k/
Iterated Random Maps and Some Classes of Markov Processes
Rabi Bhattacharya and Edward C. Waymire
After i.i.d, sequences of random variables, one is hard pressed to identify a structure with a more pervasive role in the theory and applications of probability than that of discrete parameter Markov processes. While the historical development is rich in concepts and techniques, efforts to complete the law of large numbers and central limit theory for Markov processes continue down fascinating pathways. In this article we present a contemporary survey of some of the main historical developments on these benchmark problems up through some state of the art theory and methods.
1. Introduction
The Markov property refers to a natural type of statistical dependence in a sequence of random variables {Xo,X1,...,An,...}, each taking values in a state space S equipped with a sigmafield 5p, typically expressed as the property of:
conditional independence of the future process {Xm+l,Xm+2,...} from the past {Xm-l,... ,X0} given the present Xm, for each m = 1,2 . . . . The notion of "conditional distribution" required by this definition contains some technical difficulties that are easily avoided by restricting the state space generality to a Borel subset S of a Polish space with Borel sigmafield 5C Such a restriction does not severely restrict the scope of applications and will be adapted throughout this paper. One obvious advantage of this topological restriction on the state space S is that it makes available Doob's (1953) construction of regular conditional probabilities for defining transition probabilities. Namely, one may write
P(Xm+l CB1,...Xm+k ~BklXm)~- ~B "'" fB Pm--l(Xm'dXl)

1 2 k
Pmw2(X1, d x 2 ) . .
"pm+k(Xk_l,dXk) B i E c~ ,
145
(1)
146
R. Bhattacharya and E. C. Waymire
where each transition probability pro(x, dy) satisfies: (i) x - + pm(x,B) is a nonnegative Borel measurable function on S for each B E Y; (ii) B ~ pm(x,B) is a probability measure on 5p for each x E 5C The special case that Pm (X, dy) = p(x, dy) for each m = 0, 1 , 2 , . . . is referred to as time-homogeneous transition probabilities. Unless explicitly mentioned to the contrary, we assume time-homogeneity for the transition probabilities throughout. A second important consequence for us of the assumed topological structure on S is that any such sequence {Xo,X1,... ,Xn,...} having the Markov property with time-homogeneous transition probabilities may be represented as arising from a sequence of iterated i.i.d, maps. Although the representation is not unique, it is thus no surprise that many of the important Markov process models in applied probability come already naturally expressed in the form of iterated maps. So, before we explain the underlying theory any further, let us first consider some concrete illustrations. In each example one finds a sequence of i.i.d, random maps ~n,n _> 1, on S into S, defined on some probability space ( f L Y , P), such that
X 0 = x, X 1 = O~lX,... , Xn = ~ n ' " O ~ l X ,
n
_> 1 .
(2)
Here e,(co) is a map on S, for each co E f2, whose value at x E S is denoted :~x under the usual probability convention of suppressing co. Also en -. el denotes the n-fold composition of the maps c~n,...,cq. One may easily check that {Xo,X~,...} has the Markov property with transition probabilities
p(x,B) = P(CtlX E B),
x E S, B C 5 p .
(3)
Monte-Carlo simulations of the processes occurring in several of the examples below can be understood and explained by application of the subsequent theory to be given in the remainder of the paper. We will return to these examples in the final section of this paper.
EXAMPLE 1. (Two-state to Real-valued Markov Chains) Let S = {0, 1} and, for simplicity, assume that 0 < P00 < P~0 < 1. Here Pij = p(i, {j}) - P(X1 = jiXo = i). Let F = {71,72, 73} denote the collection of maps on S defined by
71x=0;
72x=1;
73x=l--x,
xCS={0,1}
(4)
Define a probability distribution Q on F by O({71})=p00; O({Tz})=l-p10;
Q({73))=plo-poo.
(5)
Now if c~ is a random map on S with distribution Q then
pij=P(~i=j),
i, j E S =
{O, 1} .
(6)
More generally, suppose that p(x, dy) is a transition probability on S = R, equipped with Borel sigma-field 5# = ~ ( R ) . Let F~(y) denote the distribution function defined by
Iterated random maps and some classes of Markov processes
147
Fx(y) = p ( x , (-cx~,y]),
Define an inverse function by
y ER .
(7)
.
Fx'(U )=inf{y:F~(y)>u},
u(0,1)
(8)
As is well-known, if U is a r a n d o m variable uniformly distributed on (0, 1) then fx-1 (g) has distribution function Fx. Thus, defining a r a n d o m m a p c~ by c~.x=F~- I ( U ) , one has xES=R , (9)
p(x,B) = P(c~x C B),
B C .~(R) .
(10)
In order that (10), or (3), m a k e s sense, and for x ~ p(x, B) to be measurable for each B E 5 P, a technical condition is imposed on the r a n d o m m a p s that e : (~o,x) ~ ~(co)x be measurable on (~2 x S, f f 5~). EXAMPLE 2. (Temporally Discretized Diffusion) The Euler m e t h o d from numerical analysis applied to a stochastic differential equation
dX(t) = #(X(t),t)dt + a(X(t),t)dW(t),
t> 0 ,
(11)
is a t e m p o r a l discretization of the f o r m (e.g., see Kloeden and Platen (1992))

Yn+ = + +
(W(~'n+l) - W('cn)),
n = 0, 1 , 2 , . . .
(12)
where 0 = z0 < ~l < < r, < .. is a given partition of time, and I1, = X ( z , ) . In the case of the O r n s t e i n - U h l e n b e c k process defined by l ~ ( x , t ) = - p x , # > 0, a2(x, t) = a 2 > 0, one has for v~ = ne
Yn+l = (1 -- 5~)Yn -7 X/~O-/n+l (13)
where ( Z ,}~=1 is an i.i.d, sequence of standard n o r m a l r a n d o m variables. N o t e that by taking the time increment c sufficiently small, one m a y assume 0 < 1 - e# < 1. This is a simple linear one-dimensional case of the following m o r e general model. EXAMPLE 3. (Linear/Nonlinear Autoregressive Models) The general kth order autoregressive process {X,},~0 is defined by
Xn+l = h ( X , + l - k , . . . , X n ) + q,+l,
n = k-
l,k,...
(14)
where h : R k ~ R is a Borel measurable function, {r/n},~l is an i.i.d, sequence of real-valued r a n d o m variables, and the initial data (Xo,X1,...,Xk 1) has an arbitrarily prescribed distribution, independent of { n},=l- The a s y m p t o t i c properties of {X,},~=0 m a y be obtained f r o m the naturally associated M a r k o v oo on S = R k defined by Y~ = (X~,X,+I,... ,X~+k-1) satisfying process { y ,},=0
148
Y~+I = f ( Y , ) + en+l
(15)
where Y0 has an arbitrary distribution in R k independent of {tl,},~=~ and f : R k ~ R k, {en}noC=l a r e defined by f(Yl,...,Yk)=(Y2,...,Yk, hCv)), e,=(0,...,0, t/,) . (16)
The pth order autoregressive model AR(p) from the classical theory of time series is the special case given by taking h linear: AR(p) : k = p,
h(x1,...,Xp) = floX1-'}- '' " -}- ~p_lXk ,
where fl0,..., tip-1 are given real constants. In particular, therefore, f is a linear transformation with matrix representation (~ f . .
1 0 0 ...
1 . . . 0 0 0 0 0 ... -..
0
0 0
0 )
0
1
P -I
(17)
&
P3 "'
&2
EXAMPLE 4. (ARMA Models) The autoregressive moving average model ARMA(p, q) from classical time series is of the form:
p 1 j-0 q
,=0,1, .. , (18)
i-1
where p, q are positive stants, { q,},=p_q is an (Xo,X1,...,Xp-1) has {q,},~l. The naturally
integers, fly, ai,j = 0,... , p - 1, i = 1,..., q, are real coni.i.d, sequence of random variables, and the initial data an arbitrarily prescribed distribution, independent of associated Markov process
l/In+p-1)}na_O
{ Y n = ( X n , " " ,Xn+p-l,tln+p-q, ..
on S = R k in this case is defined by Y,+I = f ( Y , ) + e~+l , where Y0 has an arbitrary distribution f : R p+q ---+R p+q, {Q}neC_l are defined by
f(Yl,'''
,Yp+q) =
(19) independent of {t/,},%~ and
(Y2
,Y3, " . . , Y p - l , ~
flj-lYj-k
j=l
~ ~)q-j+lYp+j,Yp+2, . , Yp+q-1, 0 j~l
and
~n = (O,...,O,/~n+p q , O . . . , O , ~ n + p 1)
Iterated random m a p s and some classes o f M a r k o v processes
149
In particular, therefore, f is a linear t r a n s f o r m a t i o n with matrix representation 0 0 1 0 0 1 0 0 .. . 0 0 0 0 0 0 0 0 0 0 ... ... 0 0 O\ 0
f=
fl0 0 0
0
fll 0 0
0
f12 0 0
0
f13 0 0
0
"" .. -.
.-
tip-2 0 0
0
tip-1 0 0
0
t~q (~q 1 0 1 0 0
0 0
~q-2 0 1
0
"'' -.. ...

.-.
(~2 t~l 0 0 0 0
0 0
(20) Figure 1 provides a scatter plot o f the asymptotic distribution o f (X,, Y,), n = 0, 1 , 2 , . . . , in the case o f the second order (p = 2) autoregressive model with m a p h(x,y) = Ix + y started from the origin, and ~& standard normal. EXAMeLE 5. (Queuing Lengths) The waiting time process {W0, ~ , . . . } for rand o m arrivals in a G / G / 1 queue to begin service is a p o p u l a r example o f a M a r k o v evolution represented by a r a n d o m walk on the positive half-line o f the f o r m
Wj+I=(Wj.+Yj+I) +,
/4/b=O, j = O , 1,2,... ,
(21)
where {Y1, Y2,...} is an i.i.d, sequence with EY1 < 0 representing the differences between service time o f the j t h customer and the interarrival time o f the (j + 1)st customer for j = 1 , 2 , . . . . A simulation o f the existence o f a steady state distribution is given in Figure 2 for the case o f exponential service and interarrival times, with means 1/2, and 1 respectively.
.-
.....:::,ffN,~:.~--:i-.\.:.;.
"" "'.'a'~. .'~'.'4.'.'::: .. "." "
...-.-:( ~:~ . . . ,. ,~i2.'~ ~ ! ~ ' , " : ' "

.
. . . .
,,
~',-.-~, .,...
,:'l~':..e'.
". .. - . .q . .. ,. ' . " . 2 _ ~ . ~ ., ~ . ~ . ~ . ~ :

. .
i'.N. :
....
.,, : ..,....' : "

I~11 I ' I I
..
-2
I
.=
' II
"
....,,:~....;.~.,.:.....,..
~1"{~ ~ "I~ 2
t ~ : ~
_~ 26
I -4 -2
I 0
__
i 4
i 6
Fig. 1. Second order autoregressive model steady state distribution
h(x,y) = x + y, (xo,yo)
(0, 0).
150
1
0.9
018
017
0.6i
0.5
0,4
0.3
0.2
0.1
i 1
T 2
i 3
~ 4
i 5
Fig. 2. Waiting time steady state distribution under mean one interarrival and mean one-half exponential service times. EXAMPLE 6. (Gibbs Sampler) Consider a r a n d o m field of l - v a l u e s distributed M } according to a over a finite two-dimensional lattice L - {(i,j) : 1 < i,j <<_ Gibbs distribution ~ on the phase space S = { - 1 , + ~ L , of the form
7"g({Sij:
i,j E L}) = c~ exp fi
(id),(m, n) SijSmn
(22)
where ~ ' denotes a sum over neighbor sites (i,j) and (m, n) in L; i.e., either i = m and IJ - nl = 1 or j = n and [i - m[ = 1, and fi is a non-negative real-number p a r a m e t e r inversely p r o p o r t i o n a l to temperature. Gibbs samplers refer to algorithms which generate at least a p p r o x i m a t e sample realizations f r o m re, even though one m a y not be able to c o m p u t e the normalization constant cB, by constructing a M a r k o v chain with n as a unique invariant probability and then sampling from the M a r k o v chain after "sufficient time" has passed. Given a sample configuration s and a fixed site (m, n) C L, let s ran+ denote the element o f S which is the same as s at all (i,j) except possibly for (m, n) where it is assigned the value + 1. Similarly, s ran- denotes the configuration which is the same as s at all (i,j) except possibly for (m, n) where it is assigned the value - 1 . In one time step a configuration s is changed to a configuration s '~"+, for some (m, n) ~ L, with probability p(s, {Sin"+}) = M-2n+(mnls) or to s m~-, for some (re, n) E L, with probability p(s, {smn- } ) = M-2n_ (mnls), where
= + = 1 +(mnls). (23)
All other transitions occur with probability zero. N o t e that the normalization constant cp cancels in the definition of these transition probabilities and is
151
therefore not needed to generate transitions. One may also easily check the timereversibility condition that p(s, {smn} )~( {s} ) = p(s mn, {S} )~( {Smn} ) and therefore n is an invariant probability for this transition law. F r o m here one m a y construct i.i.d, iterated maps to represent the transitions. Specifically, for (m, n) E L and u E (0, 1) define the m a p fm,,, on S by
{ sij (fm~,,(S))ij = ec
+1 --1
if (i,j) (m, n) if (i,j) = (re, n) & 0 < u < ~+(mn[s) if (i,j) = (re, n) & rc+(mn[s) <_ u < 1
(24)
Let {e~}n= 1 be i.i.d, uniform on L, and let { n}~=~ be i.i.d, uniform on (0, 1), independent of {en}n~ 1. Then the M a r k o v process is represented by the i.i.d. r a n d o m maps e, = f~,,v,,, n _> 1. Observe that since fi _> 0 the maps are monotonically increasing on S; here monotonicity is with respect to the partial order on S defined by s <_ s I if and only ifsm, <_Sm~ for all (m, n) E L. As a result Propp and Wilson (1996) observed that one may use the backward iteration to get an exact sample from ~ by stopping the M a r k o v process at the first (random) time T such that
O~1
. . . O~T S + ~ O~1 . . . ~ T S -
g~ eo
where s + is the configuration of all plusses and s - is the configuration with all minuses. In Figure 3 we provide an exact sampling from ~ in the high temperature case with /~ = +0.1 on a 20 x 20 grid with periodic boundary conditions. This temperature is judged large with respect to the famous critical (inverse) temperature of Onsager sinh(2/~)= 1 for the 2-d Ising model, i.e., /~ = (2 + ~ ) / 2 ~ 0.3752. An exact sample from a 4200 4200 grid with periodic boundary
Fig. 3. Sample realization of 2-d Ising distribution/~ = 0.2.
152
R. Bhaltacharya and E. C. Waymire
conditions at criticality can be viewed in the editor's supplement to the article by Diaconis and Freedman (1999). EXAMI'LE 7. (Parametric Dynamical Systems) Many deterministic dynamical systems Xn+l =f(xn),n>_O, for example given by the quadratic maps f ( x ) = b x ( 1 - x ) , x E R , or the Henon model f ( x , y ) = ( a - b y - x 2 , x ) , (x,y) E R 2, depend on prescribed parameters which define different parametric regimes for behaviors ranging from periodic stable motions to chaos; e.g., see Devaney (1989). One may view Examples 1.3, 1.4 as introducing noise into such a deterministic system, or alternatively, one may consider the corresponding evolution defined by i.i.d, random parameters bl, b2, , or (al, bl), (a2, b2), .. This also leads to new and interesting behavior of the dynamical system. For example, the deterministic quadratic map on S = (0, 1) is known to have a unique stable fixed point at 1 - 1 for 1 < b _< 3. However as illustrated in Figure 4 random selections between two parameters in this region, namely bl = 1.3, b2 = 2 leads to an invariant distribution with the Cantor-like distribution function supported on the interval (approximately) (0.2, 0.5) depicted there. EXAMPLE 8. (An Example from Economics) Consider a one-good economy starting with an initial stock of X0 = x > 0 units of this good which is used to produce an output Y1 in the first period of time. The first period output Y1 is a random function of the initial input x with value fr(x) with probability pr>0,1 <r<N, where ~ r N = l P r = 1 and the so-called production functions f~,l < r < N , are smooth, strictly increasing and concave: f'(x)> 0, and f"(x) < 0 f o r x > 0 withfr(0) = 0 , 2 ( 0 ) > 1 , f ' ( + o c ) = 0 and fj(x) > fk(x) for all
"l
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.~
0.25
0.3
0,35
0.4
0,45
O.5
Fig. 4. I t e r a t e d q u a d r a t i c m a p s steady state d i s t r i b u t i o n bl = 1.3, ba = 2,p0 = Pl = 0.5.
153
x > 0 i f j > k. A fraction 0 </~ < 1 is consumed and the remainder (1 -/31)Y1 is invested for production in the next period. The total stock X1 on hand for investment in period 1 is OXo + (1 -/~)Y1, where 0 < 0 < 1 is the rate o f depreciation of capital used in production. This process is repeated independently over successive periods of time. Thus the capital Xn+l on hand in the period n 4- 1 satisfies the iterative law
Xn+l = OXn 4- ( 1 - fl)(fln+l(Yn), n ~ O, Xo = x ,
(25)
oo is the i.i.d, sequence of production functions distributed as where { qon}~=l
P((Pl = fr) = Pr,
1< r < X .
(26)
A simulation of the long time distribution is given in Figure 5 for the case k = 2, fl(x) = 8 x U 4 , f 2 ( x ) = 16xl/2,p] =/)2 = 1/2,0 = fi = 1/2,X0 = 100. The first Example 1 provides a key to the general representation theorem which asserts that such a representation is possible whenever S is a Borel subset of a complete and separable metric space; a proof may be found in Kifer (1986), Bhattacharya and Waymire (1990). The present article complements the excellent survey article by Diaconis and Freedman (1999) by exploiting certain splitting properties of the maps which occur with positive probability, rather than the contractive properties of the maps on average emphasized in this latter reference. We provide only nominal review of the theory based on contractive maps. As noted earlier, each of the phenomena illustrated in connection with the above examples is explainable by the theory described in this survey. The organization of the remainder of this paper is as follows. In the next section we give a
f 02
0.8
0.7
0.6
0.5
0.4
0.3
0,2
0.1
i
00
5o
lOO
150
2oo
p 250
I 300
Fig. 5. Steady state inventory distributionfl (x)
8x1/4,f2(x) = 16xV2,pl
P2 0.5.
=
154
more precise formulation of the central mathematical problems. In Section 3 we discuss two celebrated ideas in the theory of Markov chains introduced by Doeblin (1937, 1938) in two separate papers, namely minorization and coupling. Interestingly, Doeblin introduced coupling as a technique for obtaining convergence to a unique invariant probability for finite state Markov chains, and then introduced minorization as an approach to general state spaces. While Doeblin was quite aware that one may apply his minorization to finite state Markov chains, we show here that one may also apply coupling to general state Markov chains under minorization. This provides the opportunity to introduce another extremely useful idea, namely backward iteration, which was exploited by Furstenberg (1963) for products of random matrices but has now become a standard technique in a variety of contexts. One may expect that the existence of an invariant probability as an asymptotic relative frequency of visits might most readily be computed under conditions of independent cycles of some reasonably finite length. This is essentially the theme of the next two Sections 4 and 5 where classic notions of recurrence and regeneration theory for Markov chains are introduced. In Sections 6 and 7 we introduce and generalize a now classic splitting notion originally formulated by Dubins and Freedman (1966) to analyze randomly iterated monotone maps. The paper is concluded by a return to the examples provided in the introduction to illustrate the applications of the general theory. 2. Ergodic problem, rates of convergence, and the central limit theorem The law of large numbers for the Markov process {Xo,X1,... ,Xn,...} refers to the existence of a unique invariant probability ~ such that for a bounded measurable function f : S -+ R one has for any initial distribution of X0,
N-1 N---+oo n=0
lim
Z f ( X n ) = fsf(X)=(dx ) .
(27)
By the ergodic theorem, if X0 has distribution ~, then (27) holds. It follows that (27) holds with X0 = x for all x outside a ~-null set. Taking f = 1 [B],B E ~9 ~, provides the special case of convergence of the transition probabilities in the Caesaro-mean
N-1 N---+oo
lira ~ p ( n ) ( x , B) = re(B)
n--0
(28)
for all x outside a ~-null set. Naturally one may seek stronger forms of convergence with estimates on the rates of convergence. Let d denote a subcollection of J . Then the function d on the space ~(S) of probability measures on (S, Y) defined by d(#,v) = sup{l#(A) - v(A)[: A E d } , (29)
155
is a pseudo-metric which, depending on the choice of d , may be a complete metric on ~(S). Important examples are the total variation metric defined by ~4 = 5~ and, in the case S = R ~, the Kolmogorov metric defined by d = {(-ec, all " " x (-oc, ak]: aj E R, 1 <_j <_ k}. The central limit theorem requires broad classes of functions f in L2(S, 7r) for which the functional central limit theorem (FCLT) applies. That is, one seeks conditions such that the sequence of polygonal processes defined by
1 / [nt] /
converges in distribution to a Brownian motion regardless of the initial distribution. The Billingsley-Ibragimov martingale central limit theorem (Billingsley (1968, Theorem 23.1)) plays a basic role in deriving functional central limit theorems for general ergodic Markov processes (e.g. see Gordin and Lifsic (1978), Bhattacharya (1982)). The martingale approach offers tremendous computational advantage over classical mixing dependence in this context. In fact, mixing rates are virtually impossible to compute in general. Moreover the martingale central limit theorem is applicable to each centered function belonging to the range of the generator of the Markov process. The class of such functions is dense in the L2-space with respect to the invariant probability. When X0 has distribution re, the FCLT holds for (30) if f := f - E ~ f belongs to the range of T - I. Here E ~ f = f f d T z , T is the transition operator and I the identity operator on L2(S, re). Also, importantly, an analytical expression for the variance parameter o-2 of the limiting Brownian motion is provided by the martingale approach, a 2 = E~92 - E~(Tg) 2, where g is the mean-zero solution of (T - I)9 = f . After one has obtained some criterion for the ergodicity problem, then in the martingale approach the central limit theorem problem may be viewed as a problem in identifying large subsets of the range of the generator. This latter problem may be approached by estimates on rates of convergence to a unique invariant probability, e.g., see Bhattacharya and Lee (1988).
3. Doeblin's minorization and coupling via backward iteration

Two of the most powerful ideas in the modern theory of Markov processes were introduced in Doeblin (1937) and Doeblin (1938); namely minorization and coupling, respectively. Doeblin's minorization condition provides a powerful approach to check for the existence of a unique invariant probability, which also gives uniform exponential rates of convergence in total variation distance. In fact, Doeblin's minorization is also necessary for uniform exponential rates in total variation distance; see
156
Nummelin (Theorem 6.15, 1984), Tierney (Proposition 2, 1994), Meyn and Tweedie (Theorem 16.2.3, 1993). Doeblin's minorization requires a probability measure v on (S, 50), a positive integer N, and a positive real number 6 such that
p(N)(x,B) >>_av(B), x E S ,
B ~Y ,
(31)
where p(m) (x, dy) denotes the m-step transition probability defined inductively by
p()(x, dy) = ~Sx(dy), p(m+l)(x,B) = .~ p(y,B)p(m)(x, dy),
m >_ 0
(32) In Bhattacharya and Waymire (1999) the authors adapt another powerful idea introduced by Doeblin (1938) for the ergodic problem of finite slate Markov chains, namely coupling, to prove convergence for general state spaces. The proof also employs the backward iteration method of Furstenberg (1963) which has found wide ranging application in the study of random iterated maps, e.g., see Diaconis and Freedman (1999) and references therein, and Bhattacharya and Lee (1988), Bhattacharya and Rao (1993), Bhattacharya and Waymire (1990). The backward iteration will also be exploited in Section 6 for the case of monotone increasing maps. The argument is repeated here for illustration of these ideas. THEOREM 3.1. (Doeblin's Minorization). Under Doeblin's Minorization condition there is a unique invariant probability re on (S, 5P). Moreover
]p(")(x,B) - re(B)[ < (1 - 6) [?~] n _> 1 .

PROOF. By viewing the process {X,}~_0 at times 0 , N , 2 N , . . . , one may regard p(N) (x, dy) as a one-step transition probability. By this device one may restrict attention to the case N = 1. First observe that Doeblin's minorization is equivalent to the existence of a representation by i.i.d, maps cq, ~2,..., of the form (2) such that ct, is constant on S with probability 6. For clearly if such a map exists then
p(x,B) = P(cqx E B) >_P(0q EFc, oqx E B) = 6P(cqx E BI~1 c Fc),

= av(B)
where Fc denotes the collection of constant maps on S. Conversely, if Doeblin's minorization holds then define i.i.d, random maps as follows: At time step n, with probability 5 and independently of ~k, k < n, select a random point distributed as v and define c~ as the constant map with this value, or with probability 1 - 6 select e, as the map such that c~,x is distributed as (1 - 6) l(p(x, dy) - 6v(dy)). Then by construction
p(x, dy) = P ( c ~ n x c d y ) ,
xCS
Iterated random maps and some classes o f M a r k o v processes
157
Now, since a constant map will occur among the i.i.d, sequence cq, ~2,... with probability one, the a.s. limit of the backward iteration exists and is given by lim
n ---~ O 0
~1 " " O~nX :
0~1 " " ~T x
where T:=inf{n> l:a~Fc}
One may readily check that the distribution 7c of this limit is an invariant probability. Now, with the minorized representation defined above one has a natural coupling of the processes with transition probability p(x, dy) started in state x E S and initial distribution ~, respectively, given by {(Xn, Yn)}~=o, where X n = C t n " " CZlX, Yn = ~Xn''" 0~1 YO, n ~ l , with Y0 distributed as ~ independently of the c~s. In particular it follows that I p ( " ) ( x , B ) - ~ ( B ) l < _ P ( r > n ) <_ ( 1 - 3 ) " , xES, BC5 f . []
4. Harris recurrence, small sets, local minorization, and regeneration

The notion of a small set Ao provides a localized minorization condition defined by a subset A0 E 5~ of the state space S, a probability measure v on (Ao,Ao ~ 5~), a positive integer N, and a positive real number ~ such that
p(N)(x,B)
> ~v(B),
x E A0,
BE~
(33)
see Meyn and Tweedie (1993) for a treatment of small sets. These are also the so-called C-sets in Orey (1971). We simply refer to (33) as local minorization on Ao, where we will assume further that this occurs on a recurrent set Ao in the sense that Px E Ao =1, x ~S . (34)
Here Px denotes the conditional probability P(.IX0 = x). Local minorization on a recurrent set is equivalent to Harris' familiar notion of (p-recurrence (see Harris (1956)) whenever 5~ is countably generated; necessity is proved in Orey (1971) and sufficiency is straightforward to check. A formulation in terms of iterated maps may be obtained along the lines introduced by Athreya and Ney (1978) to identify the regeneration structure in locally minorized Markov processes on a recurrent set A0. The localized process is defined by the restriction of the Markov process to times of visits to A0. PROPOSITION 4.1. (Local Minorization). The local minorization condition (33) on a recurrent set A0 is equivalent to the existence of a representation by i.i.d, maps c q , e 2 , . . . , of the form (2) such that an is a constant on A0 into A0 with probability 6.
158
PnOOF. The idea for the p r o o f in the case N = 1 is to first note that if such maps exists then for x E Ao,B E 5 P, one has
p(x,B) = P( lX
B) _>
re,
c B) =
P( lX c
re) ,
where Fc denotes the collection of constant maps on A0 into A0.. Conversely, suppose that local minorization holds on a recurrent set A0. Let ~n be a representation by i.i.d, maps and define an alternative representation by i.i.d, maps fin, n > 1 constructed as follows: Toss a coin with probability 6 of heads. I f head occurs then select a random point in A0 distributed as v and define fil x as this constant value for all x E A0, but if tail occurs then for each x E A0 let//1 x be distributed as (1 - 6)-l(p(x, dy) - 6v(dy)). I f x c S - A 0 , then define filx = ~lx. N o w let fin be an i.i.d, sequence of maps distributed as ill. Then by such a construction
p(x, dy) = Px(flnx E dy),
xES
The detailed argument, including construction of the probability space etc, will appear in Bhattacharya and Waymire (2001). [] The following theorem is a centerpiece of general M a r k o v process theory based on various notions of Harris recurrence by Orey (1971), regeneration by Athreya and Ney (1978), or so-called Nummelin splitting by Nummelin (1978). A p r o o f of this is given in Section 7 (See the R e m a r k at the end of Section 7). THEOREM 4.1. Assume the local minorization condition (33) on a recurrent set A0. In addition assume that sup Exz,4o < oo ,
xEAo
where Ex denotes expectation under X0 = x, and ~A0 = inf{n _> 1 :Xn E A o } Then there is a unique invariant probability rc on (S, 50). Moreover for x ~ S,
N-I
sup 1 E p ( ~ ) ( x , B ) _ n(B) ~ 0 as n --~ oc .

n=O
5. Liapounov functionals and Foster-Tweedie drift conditions

The finiteness of the expected time to renew a visit to A0 may often be checked by the extensions, due to Tweedie, of Foster's drift conditions, or equivalently, stochastic Liapounov conditions; see Meyn and Tweedie (1993) for a thorough treatment. The basic notion is analagous to that of the Liapounov function found in the stability theory of ordinary differential equations. To be specific, one has the following useful criterion for the existence of, and convergence to, a unique invariant probability. For its statement, recall that the
159
transition operator T of a M a r k o v process with transition probability p(x, dy) is defined by (Tf)(x) = f f ( y ) p ( x , dy) for all measurable f for which the integral exists for all x. THEOREM 5.1. Suppose the local minorization condition (33) holds on some set A0. In addition, assume that a measurable nonnegative function on V exists such that
rV(x) - V(x) < - I

xEAo
W ~ A;,
sup{TV(x) - V(x)} < oo . Then there exists a unique invariant probability 7t and one has
n 1
sup 1 ~-~p(m)(x,B ) _ ~(B) ---+ 0

B E Y /I m = 0
as n -+ oc ,
for every x E S. A p r o o f of this result of Tweedie (1975), (1983) (which is a generalization of an earlier result of Foster) m a y be found in Meyn and Tweedie (1993), Section 11.3. The function V here is referred to as a stochastic Liapounov function. If, in addition to the hypothesis of Theorem 5.1, one assumes that the M a r k o v process is aperiodic, then the convergence to = may be strengthened to
suplP(n)(x, B) BcJ
,0
as n ---+ec ,
for every x E S (See Meyn and Tweedie (1993), Theorem 13.10.1, pp. 309-310). Finally, one has the following result of Tweedie on the so-called geometric
Harris ergodicity.
THEOREM 5.2. Suppose the local minorization condition (33) holds on some set A0, and that there exists a measurable function V > 1 such that for some 0 < 1 and M < oc one has
rV(x) < OV(x)

rV(x) < Or(x) + M
Vx ~ A;
w c Ao .
In addition, assume that the M a r k o v process is aperiodic. Then the process is geometrically Harris ergodic, i.e., there exists a unique invariant probability 7cand sup{Ip(~)(x,B) - ~(B)] :B E ~ } converges to zero geometrically fast as n --+ ec, for every x c S. F o r a p r o o f see Meyn and Tweedie (1993), pp. 354-378.
6. D u b i n s - F r e e d m a n splitting for m o n o t o n e maps and its generalization
Monotonicity in one form or another is often a useful structure for tracking properties of a sequence. One form of monotonicity which arises for spatial
160
R. Bhattacharva and E. C. Waymire
distributions possessing certain types of attractive (positively correlated) interactions may be formulated as follows; see Daley (1968). DEFINITION 6.1. Suppose that S is partially ordered by a topologically compatible partial order G; i.e., {(x,y) : x < y} is closed in the product topology on S x S. Let Yd denote the collection of continuous real-valued functions on S which are monotone in the sense that f ( x ) G f ( y ) whenever x < y. If #, v are probability measures on (S, 5P) then ~ _< v if and only if f s f d # <_ fsfdv, for all f E / d . This structure extends to Markov process on a partially ordered state space and may now be formulated as follows. DEFINITION 6.2. A Markov p r o c e s s {Xn}ne=l is said to be monotone (or attractive) if and only if TJ~ C_ rid, where, for f E / d , Tf : S --+ R is defined by
Tf(x) = Ex f(X1),
x CS .
An equivalent version of monotonicity of the process is that a partial order of initial distributions, say # _< v, is preserved by the evolution, i.e., for each n _> 0,
Pu(Xn E ") < Pv(Xn E ")

PROPOSITION 6.1. Suppose that { n}~=0 has a representation by iterations of i.i.d. increasing maps. Then the Markov process is monotone in the sense of Definition 6.2. The following is an interesting general consequence of monotonicity of Markov processes which sheds some light on the uniqueness of invariant probabilities; see Daley (1968). PROPOSITION 6.2. (Daley) Suppose that {Xn}n~0 is a monotone Markov process started with an invariant probability It. I f f E / d with Ef2(Xn) < ~ , then
X, O<3
7~ = cov(f(Xo),f(X,)),
n = O, 1,2,...
is a decreasing sequence of non-negative numbers. Moreover, if ~ is unique then lim,_~oo ?~ = O. The case of representation by i.i.d, monotone increasing maps implies that the oo is a Markov process is a monotone Markov process. Conversely, if {X, ~}~=0 monotone Markov process on S = R then the construction in Example 1 of the introduction provides a representation by i.i.d, monotone increasing maps. One important advantage of monotonicity introduced in Section 3 is provided by the following backward iteration tool. Let Q denote the distribution of ~g1 on a space of maps (F, N), and L7 v the corresponding product measure on (F N, N N) for a positive integer N. Suppose that {c~,}n_> 1 is an i.i.d, sequence of monotone increasing maps on an interval
161
S = J having a largest element b. Then Y,(b) = ~1 "" ~nb is a decreasing sequence in n with a.s. limit Y say. I f Y > - e c a.s., then the distribution of Y is an invariant probability for the Markov process X, = cq -.. c~,X0. If S has a smallest element a as well, i.e., S = [a, b] then Y~(a) 1( Y_ a.s. as n --+ ec and Y < -Y. Thus _Y= Y a.s. is a necessary and sufficient condition for uniqueness of the invariant probability in this case. Suppose this unique invariant probability has more than one point in its support. (This is the case if the maps 7 do not have a common fixed point, a.s. (Q)). Then it is easy to check that the following splitting condition holds. Dubins-Freedman Splitting Condition: There exist x0 E [a, b], N > 1, and 6 > 0 such that
P(C~N'"~lX <_XO Vx C [a,b]) _> 6,

P(O~N'''O;IX ~ X0 VX E [a,b]) ~ 6 .
Conversely, it is shown in Dubins and Freedman (1966) that if {e,},_>l are i.i.d. monotone (may be monotone increasing, or monotone decreasing), and the above splitting condition holds, then the Markov process has a unique invariant probability ~ and thatp(n) (x, [a,yl) converges to ~([a,y]), as n --+ oo, exponentially fast uniformly with respect to x and y. We record a generalization of the Dubins and Freedman (1966) splitting condition for monotone maps given by Bhattacharya and Majumdar (1999). This condition is defined by a sub-collection d of 5~, a positive integer N and a positive real number 6 such that
P(O:N... ~llA = S or (3) >_ 6 for all A E s~ .
(35)
We will refer to this condition as full splitting and to the parameter N as a splitting scale. The sub-collection s~ of 5P will be called the splitting class of sets. Bhattacharya and Majumdar (1999) obtained the following theorem. The proof is based on the simple observation that under the conditions of the theorem, the N-fold adjoint operator T *N, which maps a distribution # of X0 to the distribution T'N# of XN, is a contraction map on the (assumed) complete metric space (~(S), d) and therefore has a unique fixed point. THEOREM 6.1. Assume the full splitting condition (35). Assume that (~(S), d) is a complete metric space under (i) d ( # , v ) = s u p l # ( B ) - v ( B ) ] ,
BE~4
y, v c N ( S )
where ~(S) denotes the space of probability measures on (S, 5). Also with N as the splitting scale, assume that for each (71,..., 7N) outside a QN-null set, one has (ii) d(#OTN.--711, v T N ' ' ' 7 1 1 ) <-d(#,v), #,vE~(S) .
Then there is a unique invariant probability rc on (S, 5~). Moreover sup

xcS,BEsd
Ip(n)(x,B) - ~(B)I < (1 - 6) E~
162
7. Localized splitting
Let us begin by introducing a localized version of the splitting condition (35). o~ on (S, 5 ~) is said to have a locally DEFINITION 7.1. The M a r k o v process {X.,},=0 splitting representation by i.i.d, m a p s el, e2, - if there is a recurrent set A0 c 5 P, a sub-collection s of Y , a positive integer N, and a positive real n u m b e r 6 such that for each A C d o - A0 N d ,
P(Ao N eN " " cqlA0 = Ao,Ao N
c~z " "
C~llA
Ao or O) >_ 6 .
(36)
The p a r a m e t e r N is referred to as the local splitting scale, and the collection d 0 is the local splitting class. One m a y easily check that from Proposition 4.1 it follows that local minorization on a recurrent set A0 implies local splitting on A0 with d = 50, i.e., with local splitting class d o = A0 n 5 ~. We will consider a localization of splitting which generalizes this framework. Nonetheless, we will restrict the local splitting class to d 0 = A0 N 5 P for this generalization. U n d e r this condition, then, the space N(A0) of probability measures on (Ao,Ao n 5 ~) is a complete metric space under d0(~, v) = sup I#(B) - v(B)l,
BEdo
#, v E ~(A0) .
(37)
THEOREM 7.1. Assume the local splitting condition (36) with local splitting class d o = A o n 5C I f sup E'CA0 < cx~
xEAo
then there is a unique invariant probability rc on (S, 5 p) and

"-'
lira
= o,
s,
B c d.
Let us sketch the m a i n ideas for the case N = 1. Define vAO ()

: = 0,
zAO ('+1)
;=
inf{n > -- ~(n) AO ~- 1 : X~ E Ao},
n = 0, 1 , 2 , . . . . (38)
Also we write rA0 --= ~(1) A0" The process viewed only on its returns to A0 will be denoted X,,=X~2, n=0,1,2,.... (39)
163
It is well-known that the process {Xn}n=0 N is a Markov process on A0 with transition probabilities (e.g., see Orey (1971))
oo
PAo(x,B)=ZP~(X,
n=0
E B , X ~ C A ~o l _ < k < n ) ,
xES,
BCAoMS*
. (40)
Observe that for each x c S, the kernel B --+p&(x,B) defines a measure on the sigma-field 5C The probabilistic interpretation is the expected number of visits to B C 5p prior to revisiting d0. In particular under the assumption sup EK~Ao < oo ,
xEAo
(41)
one sees that PAo(x, S) = ExzAo < oo, x E Ao (42)
One may check that if p(x, dy) satisfies local minorization (33) on A0 then PAo (x, dy) x E A0, will satisfy Doeblin minorization (31) on A0. In fact, one has the following analagous localization of splitting along similar lines; see Bhattacharya and Waymire (2000) for proof. LEMMA 7.1. Under the conditions of Theorem 7.1 the process {2,,}~=0, started at )(0 = x E A0, has a unique invariant probability ~A0 on (Ao,Ao N 5~). Moreover, sup
xCAo,BEdo
]p(A~)(x,B) -- TCAo(B)] _< (1-- 6) [~], n>_ 1 .
With Lemma 7.1 the proof of the existence part of Theorem 7.1 is now easily completed by spreading rCA0to ~ by defining ~(B) = c.fAoPAo(X,B)~Ao(dx), B E 5 , (43)
where, writing E~,0 for expectation under initial distribution ~ZA0, c-1 = E% ~A0 > 0 . (44)
Specifically, the existence is completed by virtue of the following straightforward lemma; see Bhattacharya and Waymire (2000). LEMMA 7.2. Under the conditions of Theorem 7.1, (43) defines an invariant probability on (S, 5P) for p(x, dy). Moreover for x c A0, B E 5P,
Let us suppose that ~z is an arbitrary invariant probability. The uniqueness problem is a bit of an issue because of the nonstationarity of the process viewed as
164
it revisits A0 under P=; nonstationarity is made transparent by considering the simple 3-state Markov chain on {1,2,3} with p = p 2 , 1 = 1 - p 2 , 3 , r = p l , 2 = 1 -pl,3,p3,2 = 1,A0 = {1, 3}. However the following lemma is useful for extrapolating from stationarity on P%; see Bhattacharya and Waymire (2000). LEMMA 7.3. If re is any invariant probability for p(x, dy) then 7c = c~a0, c = ~(A0), on A0 FI Y. In particular, P~(E)_>cP~0(E),
ECg .
The idea for proving uniqueness, which we repeat here from Bhattacharya and Waymire (2000), is to use the ergodic theorem to show that for bounded measurable functions f : S ---, R, f s f d~ is determined by f and expected values with respect to ~Ao- In particular, first note that by ergodicity of the process ~o under P~ one has P~-a.s. that {X.n}n=0 lira -1 j~lf(XJ) --+ fd~z . (45)
n--+oo 17 .=
In particular, taking f = l[A0], one has P~-a.s. lim Nn = ~c(A0) ,

/7---+00 17
(46)
where N, = ~
j=i
1 [Xj C A0]
(47)
denotes the number of visits to A0 during [1,n]. Now, for arbitrary bounded measurable functions f : S ---+R we have P.-a.s.
R
.
I/I Nn m = l
Zm --+ 0
(48)
fl "=
as n --+ ec, where, for times z(m) defined by (38),

-c(m)
Zm:=
~
j=z(m 1)+1
f(Xj),
m> 1 .
(49)
It follows from (45) and (48) that the limit lim,~oo ~ ~m=lZm No exists &--a.s. oo is a stationary process under P~0' and and is given by ~@Ao) f s f d ~ . But {Z re}m=1 1 ~m=l N Z '~}N=l oo will converge P~0 -a.s. On the other hand, therefore the sequence {F in view of Lemma 7.3, these imply that P% -a.s., and in L 1,
1 X
l i m N ~-~ N---+oo m=l
- rc(Ao) ]s f d~ .
(50)
165
In particular, under P~A0, one may take expectations in (50) to get 1

~(A0)
1 N fd~z = lim E~A ~ E Z m

N---+~ 0 m=l
= E~AoZ1 .
(51)
Taking f = 1 in (51) identifies ~c(Ao) as rc(Ao) = c --
1 E~AoZAo
(52)
Thus we finally arrive at the unique determination of ~ via the formula f s f d~ = cE7%Z1 (53)
for all bounded measurable functions f on S.
[]
REMARK. The hypothesis of local splitting in Theorem 7.1 is implied by the local minorization (33). Thus Theorem 7.1 contains Theorem 4.1 as a Corollary.
8. Examples revisited
Let us now see how the theory reviewed in the previous sections may be applied to the set of Examples described in the introduction. EXAMPLE 2. (Temporally Discretized Diffusion) This example is of the general form Y0, Yn+l = bY, + Wn+l,n >_ 0, where {W~},~I is an i.i.d, sequence of mean zero Gaussian random variables with variance co-2, independent of Y0, and for e sufficiently small 0 _ 1) .
Note that the first sum is ~. . . . cqYo, while the last one is ~ l . . . ~ Y o with ~iy = by + ~+1. Thus for each n > 1 the distribution of Y, - Yo is Gaussian with mean zero and variance ea 2 ~ = 1 bzk' In the limit as n---+ oc one obtains a Gaussian limit distribution with mean zero and variance a2/#. EXAMPLE 3. (Linear/Nonlinear Autoregressive Models) In the case of the linear autoregressive model AR(p) the iteration along the lines in Example 2 yields the a.s. limit distribution of ~ = 0 f n e n + l as the unique invariant probability if the eigenvalues o f f all have modulus less than one, and if the common distribution G of t/n (n > 1) satisfies flog Ix[G(dx) <
O<3
(log+y := max{0, logy})
166
In this case one also has convergence from any initial distribution. We now turn to nonlinear autoregressive models, i.e., (14) holds with h nonlinear. The following result may be found in Bhattacharya and Lee (1995). Assume that (i) h is bounded on compacts, (ii) the distribution of r/n has a density which is positive a.e. (w.r.t. Lebesgue measure), (iii) EIq~l < ~ , and (iv) there exist constants c > 0, R > 0, ai > 0 (1 R . (54)
Then the M a r k o v process {Y~}~>0 defined by (15) is geometrically Harris ergodic in the sense of Theorem 5.2. For a p r o o f of this one first shows that local minorization holds on A0 = {lyl -< R}. Indeed, one shows that
x,yEAo
inf p(k)(x,y) > 0 .
To apply Theorem 5.2 one may now take the stochastic Liapounov function V to be V(y) = max{Iyi I : 1 < i < k} + 1 (See Bhattacharya and Lee (1995)). EXAMPLE 4. ( A R M A Models) This example may be dealt with as the (linear) case of Example 3 and precisely the same conditions on the matrix f given in (20) and the distribution of the en provide convergence to a unique invariant probability. EXAMPLE 5. (Queuing Lengths) The waiting time process involves i.i.d, maps of the form
~x:(x+Yn)
+,
x~S=[0,~).
These maps are monotone increasing and continuous, and the state space S has a smallest point x = 0. Thus the backward iteration from 0 increases a.s. to X, say. I f X < ec a.s. then its distribution ~zis an invariant probability. One may check by induction that the following identity holds for iterations of fo(x)= (x + O)+,
x e S = [0, oc).
fo2fo3""fo(O) = max(02 + - . . 0n) +

2<_j'<n
(n _> 2) .
(55)
So, taking Oj = Yj one has X,(0) = max{Y2+, (Y2 + Y3)+,..., (I12 + " " + Y,,)+} (n > 2) .
Now, by the Strong Law of Large Numbers Y2 + "'" + Y, --+ - e c a.s. as n ~ ec and therefore (Y2 + ' " + Y.)+ = 0 for all sufficiently large n a.s.. It follows that the distribution of Xn(0) does not change after a finite number of terms and, in particular, X < ec a.s. To see that ~zis in fact the unique invariant probability one may use (55) to see for n _> 2,
167
fo2f03"" .fo,,(z) = max{02, (02 + 03)+,..., (02 _t_... On 1) +

+ (o2 + - - . 0. + z) +}
and therefore
X,(z) = max{Y2+, (I12 + Y3)+,..., (g2 + - " + Y, + z ) +}
(n > 2) .
Again by the Strong L a w one sees that for sufficiently large n, X,(z) = X,-1 (0). So X~(z) --* X a.s. as n ~ oc. Since V/~(z) has the same distribution as X~(0), and the latter converges in distribution to n, we obtain uniqueness of the invariant probability and convergence in distribution f r o m any initial state z E [0, oc). Other a p p r o a c h e s to treat this example are possible via local splitting, or even local minorization. F o r example, choosing ao > 0 such that EYll[Y1 > -a0] < 1EY1 < 0 , one has f o r x E [0, a01C,e = -EY1, ExW~ - x _< - e and for all x E [0, a0]
ExW~ < oc .
Thus v(x) = x serves as a L i a p o u n o v functional with A0 = [0, a0] and for all x EAo,
p(x,B) = P((x
+ }11)+
E B)
P(Y1 < -x)6o(B) >_P(Y1 < -ao)g)o(B)
= ~o(B)
EXAMPLE 6. (Gibbs Sampler) The M a r k o v process in this example is defined by the i.i.d m a p s cx, = f(~,,,uo) where {c,},>1 is an i.i.d, sequence uniform on sequence o f - u n i f o r m l y distributed r a n d o m iterations starting f r o m the smallest element values at each lattice site, and the largest entirely plus values. Specifically, the lattice L, and {U,},> 1 is an i.i.d. variables. Consider the b a c k w a r d a = s consisting entirely of minus configuration b = s + consisting of
Y,(a) = ~z1... O~na Y,(b) = o~1.., o~nb (n ~ 1) .

N o w , Y,(a) is increasing and Y,(b) is decreasing as n increases, and Y,(a) < Y,(b) for all n. Since there are only finitely m a n y sites, and the probability that Y,(a) does not change for infinitely m a n y n is zero, one obtains the a.s. finite r a n d o m time 7 such that Yr(a) = Yr(b).
168
EXAMPLE 7. (Parametric D y n a m i c a l Systems) Although the quadratic m a p s Fb(x) = bx(1 - x ) , 0 < x < 1, are only piecewise m o n o t o n e , for certain r a n d o m izations of the p a r a m e t e r b one m a y identify invariant subintervals of the state space on which the m a p s are m o n o t o n e and where splittings m a y be found. F o r illustration suppose b has two possible values 2, # with equal probabilities. If, for example, 1 < # < 2 < 2 then each of the quadratic m a p s F~,F~ has a unique attractive fixed point 1 - p~ _1 1 - - F1 ~ respectively, on S = (0, 1). Moreover, each of the m a p s F /1 , , / ~ o is m o n o t o n e on the interval [1 - i , 1 - i] kt 2 and leaves this interval invariant. Since the two fixed points are attractive for the respective m a p s one obtains splitting at any x0 c (1 - i , 1 - ) for sufficiently large splitting scale. Thus there is a unique invariant probability on [1 - Z, 1 - %]. Now, since starting 1 1 - ~] I with f r o m any x c (0, 1) the process will eventually reach the interval [1 - ~, certainty, this is a unique invariant probability for the process on the state space S = (01 1). M o r e intricate results of this type are available in B h a t t a c h a r y a and R a o (1993), B h a t t a c h a r y a and M a j u m d a r (1999), A t h r e y a and Dai (2000), B h a t t a c h a r y a and W a y m i r e (2000).
# ~ . .
EXAMPLE 8. (An Example f r o m Economics) In view of (25) and (26) this example involves iterated r a n d o m m a p s gr(x) := Ox + (1 - fl)fr(x) selected with probabilities Pr, r = 1 , . . . , N . In view of monotonicity, strict concavity and other properties imposed on f~ it follows that each gr has a unique fixed point at, with al < a 2 < < aN. One m a y check that the interval [al, aN] is an invariant set for the process and the process restricted to this state space satisfies splitting with respect to the splitting class of half-intervals and splitting scale no selected such that gl ") (aN) < g(~O)(a~). Thus one obtains convergence to a unique invariant probability in the K o l m o g o r o v metric if the process is restricted to the state space [al, aN]. One m a y also show that starting f r o m any state x < al or x > aN there is also convergence to this unique invariant probability for the process no matter what the initial distribution.
Acknowledgements
This research was partially supported by grants from the N a t i o n a l Science Foundation. The authors also wish to t h a n k J o n a t h o n Fassett for running the c o m p u t e r simulations.
References
Athreya, K. B. and J. J. Dai (2000). Random logistic maps I.J. Theor. Probab. (to appear). Athreya, K. B. and P. Ney (1978). A new approach to the limit theory of recurrent Markov chains. Trans. Amer. Math. Soc. 245, 493-501.
Berated random maps and some classes of Markov processes
169
Bhattacharya, R. N. and O. Lee (1988). Asymptotics of a class of Markov processes which are not in general irreducible. Ann. Probab. 16, 1333-1347. Correction, ibid. (1997), 25, 1541-1543. Bhattacharya, R. N. and C. Lee (1995). On geometric ergodicity of nonlinear autoregressive models. Statist. and Probab. Lettetw 22, 311-315. Correction, ibid. (1997). Bhattacharya, R. N. and M. Majumdar (t999). On a theorem of Dubins and Freedman. J. Theor. Probab. 12, 1165 1185. Bhattacharya, R. N. and B. V. Rao (1993). Random iteration of two quadratic maps. In Stochastic Processes: A Festschrift in Honour ofGopinath Kallianpur. (Eds., S. Cambanis, J. K. Ghosh, R. L. Karandikar and P. K. Sen). Springer-Verlag, New York. Bhattacharya, R. N. and E. C. Waymire (1990). Stochastic Processes with Applications. Wiley, New York. Bhattacharya, R. N. and E. C. Waymire (2000). An Approach to the Existence of Unique Invariant Probabilities for Markov Processes. Colloquium on Limit Theorems in Probability and Statistics. Jfinos Bolyai Mathematical Society (to appear). Bhattacharya, R. N. and E. C. Waymire (2001). Theory and Application of Stochastic Processes. Graduate Texts in Mathematics. Springer-Verlag, New York (in preparation). Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. Collet, P. and J.-P. Eckmann (1980). Berated Maps on the Interval as Dynamical Systems. BirkhauserBoston, Boston. Dai, J. J. (1999). A result regarding convergence of random logistic maps. Statist. & Probab. Letters 47, 11-14. Daley, D. J. (t968). Stochastically monotone Markov chains. Z. Wahr. verw. Geb. 10, 305-317. Devaney, R. L. (1989). An Introduction to Chaotic Dynamical Systems. Second edn. Addison-Wesley, New York. Diaconis, P. and D. A. Freedman (1999). Iterated random functions. SIAM Review 41(1), 45-76. Doeblin, W. (1938). Expose de la theorie des chaines simples constantes des Markov a u n hombre fini d'etats. Rev. Math. Union Inerbalkanique 2, 72105. Doeblin, W. (1937). Sur les proprietes asymptotiques de mouvement regis par certain types de chaines simples. Bull. Math. Soc. Roum. Sci. 39(1), 57-115, (2) 3 61. Doob, J. L. (1953). Stochastic Processes. Wiley, New York. Dubins, L. E. and D. A. Freedman (1966). Invariant probabilities for certain Markov processes. Ann. Math. Statist. 37, 837-848. Furstenberg, H. (1963). Non-commuting random products. Trans. Amer. Math. Soc. 108, 377428. Gordin, M. I. and B. A. Lifsic (1978). The central limit theorem for stationary ergodic Markov processes. Soviet Math. Doklady 19, 392-394. Harris, T. E. (1956). The existence of stationary measures for certain Markov processes. In Proc. of the 3rd Berkeley Symp. on Math. Stat. and Probab. 2, U. of California Press, pp. 113 124. Kifer, Yn. (1986). Ergodic Theory of Random Transformations. Birkhauser, Boston. Kloeden, P. E. and E. Platen (1992). Numerical Solutions of Stochastic Differential Equations. SpringerVerlag, New York. Meyn, S. P. and R. L. Tweedie (t993). Markov Chains and Stochastic Stability. Springer=Verlag, New York. Nummelin, E. (1978). A splitting technique for Harris recurrent Markov chains. Z. Wahsch. verw. Gebeite 43, 309-318. Nummelin, E. (1984). General Irreducible Markov Chains and Nonnegative Operators. Cambridge Univ. Press., Cambridge. Orey, S. (1971). Limit Theorems for Markov Processes. Van Nostrand, New York. Propp, J. and D. Wilson (1996). Exact sampling and coupled Markov chains. Random Structures and Algorithms 9, 223-252. Tierney, L. (1994). Markov chains for exploring posterior distributions. Ann. of Statist. 22(4), 1701 1762. Tweedie, R. L. (1975). Sufficient conditions for ergodicity and recurrence of Markov chains on a general state space. Stoch. Proc. Appl. 3, 385403.
170
Tweedie, R. L. (1983). Criteria for rates of convergence of Markov chains, with application to queuing and storage theory. In: Papers in Probability, Statistics and Analysis. (Ed., J.F.C. Kingman and G.E.H. Reuter) pp. 260-276. Cambridge Univ. Press, Cambridge.
"7
/
Random Walk and Fluctuation Theory
N. H. Bingham
P a r t I. R a n d o m w a l k o n E u c l i d e a n space
1. Introduction
The theory and applications of random walks are ubiquitous in the modern probability literature, and random walks form perhaps the simplest and most important examples of stochastic processes - random phenomena unfolding with time. The term 'random walk' can be traced back (at least) to Pdlya (1921) ('zufdllige Irrfahrt' in the German), but the context is much older. If we think of X 1 , X 2 , . . . as the gains or losses of a gambler on successive plays of a gambling game, and the partial sums
sn :=x1 + . - - + &
(s0 := 0)
as his cumulative gain or loss to date (by time n), then the behaviour of the o~ - the random walk with steps X~ - describes the evostochastic process (Sn).=0 lution of the gambler's fortune. In the gambling context, analysis of such aspects as the duration of play (when the player has finite capital), probability of eventual ruin etc., goes back to Pascal and Huyghens in the 17th century. For a detailed account of such work, see the classic 19th century history Todhunter (1949); for a modern treatment, see the classic text of Feller (1968), XIV.
2. Simple random walk on
The simplest non-trivial case is to let X1,X2,... represent the outcomes of a succession of independent tosses of a fair coin. Suppose the gambler bets on heads, and gains +1 for each head and (by symmetry) loses one on each tail. Then Sn represents his cumulative gain or loss on the first n plays; (S~)n~_0 is called simple random walk (on the lattice, or on 7/). Even this deceptively simple process contains many surprises if its behaviour is analysed in detail. Results include:
17t
172
N. H. Bingham
1. The walk returns to its starting position at time 2n with probability

Nan := P(S2n -= O) = ~ 1 (2nn)
One has
1
U2n ~ X/~
by Stirling's formula. The generating function is

U(s)
:=
Z
k=0
blksk ~n=0
U2nS2n =
1/(1 - s2) ,
since uzn = ( - 1 ) n ( ~ ) (Feller (1968), XI.3(b)). 2. The walk even{ually'- reaches each integer with certainty (and so, reaches it infinitely often). In particular, if So := 0 and T := inf{n : S~ = +1} , then
P(r < oo) = 1 .
Thus a gambler whose strategy is to play till first ahead and then quit is certain to make (and keep) an eventual profit. 3. With T as above,
ET = +oc .
That is, the gambler above has infinite expected playing time before he realises his eventual terminal profit. In particular, the above strategy is unrealisable in practice, as it needs unlimited playing capital, and playing time, to deliver a profit with certainty. 4. The distribution of T is
P(T
= 2 n - 1) = (-1) ~-1 n
and its generating function is

O(3
P(s) :: e(J)
:= ZskP(r
k=0
: ~) = (1 - ,/1 - ~2)/~
(Feller (1968), XI.3; thus P ' ( 1 - ) = +oc yields E T = +oc as above). 5. The distribution of the time spent positive (suitably defined) up to time 2n is given by
Random walk andfluctuation theory
173
I(S~>0, orS~=0andS~_~ >0)=k : (2k)(2:-2kk)/22. (k=0,1,...,n)
(the Chung-Feller theorem: Chung and Feller (1949), Feller (1968), III.4). This distribution is called the discrete arc-sine law (see below). 6. In consequence, the limiting distribution function of the fraction of time spent positive is (2/~) arcsin v/X (0 < x < 1) (the arc-sine law: Feller (1968), III.4). This has density
1
f(x)--~z x ~ / ~ - x )
(0<x<
1) .
This density is U-shaped, unbounded near x = 0 and x = 1, with its minimum at the central value x = 1/2. The interpretation is that a typical coin-tossing sequence is much more likely to be unbalanced - with one player ahead most of the time - than balanced, a result usually regarded as counter-intuitive when first encountered. 7. The number of paths (k, Sk) from (0, a) to (n, b) which touch or cross the x-axis is the number of paths from (0, - a ) to (n, b). This is the reflection principle, proved by regarding the axis as a mirror and using symmetry (see Feller, 1968), III.1, Grimmett and Stirzaker (1992), Section 5.3). This is the probabilistic version of Kelvin's method of images in electrostatics. It gives as a corollary the ballot theorem (III.4 below; Feller (1968), III.1, Grimmett and Stirzaker (1992), Section 5.11.7). For applications to barrier options in mathematical finance, see e.g. Bingham and Kiesel (1998), Section 6.3.2. This basic process simple random walk on 7/1 is so simple to describe that it may seem to lack depth. On the contrary, the extreme simplicity of structure means that very detailed questions concerning it may be analysed in extreme depth. Such questions include the following. (i) Long runs of heads (or tails) and their analysis. (ii) Range - number of points visited by time n. (iii) Local time time spent at the origin (or a point x) by time n, etc. For an extensive recent treatment of simple random walk in depth, see R6v6sz (1990), Part I. A similarly detailed treatment of simple random walk in 7/d is given in R6v6sz (1990), Part II.
3. Recurrence and transience
The basic dichotomy in random walk concerns whether or not eventual return to the starting-point is certain. If so, the walk is called recurrent; if not, transient. For
174
N. H. Bingham
recurrent random walks, return to the origin infinitely often (i.o.) is also certain (because limn~o~ 1n = lira 1 = 1); for transient random walks, this event has probability zero (because f o r p ~ [0, 1), l i m ~ p ~ = 0). Thus the total occupation time for the starting point - and similarly, for all other points visited - for a recurrent random walk is infinite, while for a transient random walk it is finite. As the total number of time-points n = 0, 1 , 2 , . . . is infinite, a transient random walk must necessarily have an infinite state-space. If we write u~ for the probability of return to the starting-point at time n, f , for the probability of first return to the starting-point at time n (f0 := 0), one has the convolution relation
n
u o = 1,
u,=Zfku,_k
k-0
( n > 1) .
Forming the generating functions
U(s) : : E u " s "

n=O
F(s) : :
Zf.s"
n--O
this becomes U(s) = 1 + U(s)F(s) , giving the Feller relation
U(s) = 1/(1 - F(s)) .

Write
oo
f:=Ef~
n=O
then f is the probability of eventual return to the starting-point, So f < 1 for transience, f = 1 for recurrence. Thus one has recurrence if u := U(1) = ~ u, diverges, transience if ~ u, converges.
4. Simple random walk on 2~a; P61ya's theorem

Suppose now we are in d-space Nd, more particularly in the integer lattice Z a. We start at the origin, (0, 0 , . . . , 0), and move to each of the 2d 'neighbouring' or 'adjacent' points, those with one coordinate 1 and the rest zero, with equal probability 1/(2d); successive steps are independent, each with this distribution. The result is called simple random walk in d dimensions. It was observed by P61ya (1921) that recurrence or transience depends on the dimension d. For d = 1, we have
u2.+l = 0 , u2n~ 1/v~ ,
Random walk and fluctuation theory
175
so u = ~ un diverges: simple r a n d o m walk in one dimension is recurrent. For d = 2, for return to the origin at time 2n one must have equal numbers, say k, of positive and negative steps in the first coordinate, and equal numbers - then n - k - in the second coordinate. Thus
u2n = ~
=
k l k ' (n - ] T ~ . ( n - k)! : 42-~

" ' k=0
= ~ 42
~ 1/(nn)
(n ---+ o c )
by Stirling's formula. Thus u = ~ un diverges, and simple r a n d o m walk in two dimensions is recurrent also. For three dimensions, we have similarly
1 w-" (2n)[ u2~ = ~f~ . j,K 2 ~ , i-" l k V "( n - J - k ) ! j ! k ! ( n - j - k)l
'
the summation being over all j, k with j + k _< n. Then

U2n
~j,~ \
V k t .,( n - j 3 ~ i~.
"-
k)[/
The terms in the large brackets are those of a trinomial distribution, which sum to one. So the sum is majorised by the maximal term, which is attained for both j and k near (within 4-1 of) n / 3 . Stirling's formula now shows that the sum As above, the term outside the summation is O ( 1 / v @ , so is O ( 1 / n ) . u2n = 0 ( 1 / n 3 / 2 ) . Thus u = ~ un < ec: simple r a n d o m walk in three dimensions is transient. The same argument applied in d dimensions, using the d-variate multinomial distribution, gives the sum as O(1/n(a-1)), and so ~ u ~ converges as before. This proves: POLYA'S THEOREM. Simple r a n d o m walk in d dimensions is recurrent for d = 1,2, transient for d = 3, 4 , . . . . Geometrically, the result may be interpreted as saying that there is 'more r o o m ' i.e. more ways to avoid returning to the origin, in higher dimensions. The probability p of eventual return to the origin in three dimensions, p < 1 by transience, has been calculated; its numerical value is p = 0.340537329544 . . . . For details and references, see Doyle and Snell (1984), Section 7.5, or Spitzer (1964), Ch. II Problems 10, 11.
5. R a n d o m w a l k s on IRa
Consider, first, the case of r a n d o m walk on Z d. Starting at the origin, each d-tuple k := ( k i , . . . , kd) is reached in one step with probability #k, where the #k, being
176
N. H. Bingham
probabilities, sum to one. Then if # = (#k) is the probability distribution for each step, that for n steps is the convolution power #n (or #*", in the alternative notation), defined inductively by #n := # #n-1 (#0 := 3o, the Dirac mass at the origin). If the starting position is x, the distribution after n steps is 3x * #n. The transition probabilities
p.(x,y) po(x,y)
:= e(s,,
= ypSo = x )
are translation-invariant:
.
Thus the transition probabilities are homogeneous with respect to the additive group structure of Z d (we return to group-theoretic aspects in Part I! below). The classic m o n o g r a p h of Spitzer (1964) deliberately restricts itself to this context, where the probabilistic structure is as unencumbered as possible by any other. One may discard the discreteness of 77 d, and work instead with N~. Here, since Nd is uncountable, measure-theoretic restrictions arise, and probabilities need to be calculated by integration rather than summation. For background, see e.g., Chung (1974), Ch. 8, Ornstein (1969). Again, one works with a sequence of partial s u n ' l s S n : = ~nk=lXk (S O : = 0) o f independent X/with distribution #, the distribution after n steps with starting-point x is bx * #n, and the additive group structure of R e plays, via the addition in the partial sums, a dominant role. Often the essential feature is that the distribution evolves through time n = 1 , 2 , . . . via the powers Pn of a matrix P, the transition probability matrix on some countable set S. Here, we are in the context of Markov chains on statespace S. For a classical m o n o g r a p h treatment see Chung (1967); for a more recent account see Norris (1997). If S has a graph structure and the nearestneighbour condition holds - that is, if a transition is possible from x to y then there is an edge from x to y - one speaks of a random walk on the graph G; see II.2 below. O f course, one can view any M a r k o v chain in this way: one typically draws in a graph structure when classifying the states of a M a r k o v chain, for instance. It is really a question of emphasis: when the properties, algebraic, geometric, topological, of the set S are themselves of interest, it is customary and convenient to use the language of random walks on S, and to take a dynamic viewpoint. If, however, the states s E S are not of particular interest in themselves, S serves merely as an index set to label the states, and one speaks of a M a r k o v chain on S. In both the random-walk and Markov-chain context, questions arise as to the nature, discrete or continuous, of both the time-set and the state-space. The traditional usage has been to speak of M a r k o v chains when time is discrete and M a r k o v processes when time is continuous. However, one can argue that it is the nature of the state-space S which is the more decisive, and speak of M a r k o v chains and processes according as to whether S is discrete or continuous; this is the point of view of the excellent text of Revuz (1984).
177
6. Harmonic analysis
For a random walk & = X1 + Xn, with the step-lengths X~ independent with distribution /z, the basic operation is forming the nth convolution powers, as above:
#, #2 : = ~ , # ,
].in := t~ , ~n 1 , . . .
we will write/~ as/~* when the convolution in the power needs emphasis. The operation of convolution involves an integration, and it is convenient to replace this by the simpler operation of multiplication. One does this by passing to the characteristic function (c.f.) or Fourier-Stieltjes transform of the Xj, or of #: qS(t) := E(exp{itXj}) = / eitX#(dx) (t E ) .
By the Uniqueness Theorem for characteristic functions, no information is lost thereby; by the Multiplication Theorem, the c.f. of an independent sum is the product of the c.f.s. Thus the c.f. of & is 4)n, the nth power of the c.f. q5 of each step-length. The basic transience/recurrence dichotomy for random walks may be expressed in terms of the c.f. qS(t): the walk is transient if and only if {R1/(1 - ~b) is integrable in some neighbourhood of the origin. This is the Chung-Fuchs criterion (Chung and Fuchs (1951), as refined by Ornstein (1969)). This extends to locally compact abelian groups (see Part II below): with/~ the Fourier transform of #, the random walk with step-length distribution/~ is transient iff ~1/(1 -/~) is integrable in some neighbourhood of the group identity (Kesten and Spitzer, 1965). Fourier analysis, like convolution, extends to the general setting of random walk on groups (again, see Part II). We note in passing the extension of P61ya's theorem to general random walks on Rd, due to Chung and Fuchs (1951). (i) If d = 1 and the mean EIX I of the step-length exists, the random walk is recurrent if EX = 0, transient otherwise. (ii) If d = 2 and the variance, or E(IXI2), exists, the random walk is again recurrent if EX = 0, transient otherwise. (iii) If d > 3, all properly d-dimensional random walks are transient.
7. Potential theory
Potential theory as part of the classical theory of electricity and magnetism (and of gravitational attraction) grew out of the work of Green and Gauss in the 19th century. The theory of Brownian motion, or of the Wiener process, is a 20th century development; potential theory was linked to Brownian motion by Kakutani (1944). Classical concepts from the electromagnetic theory of continuous bodies such as equilibrium charge distribution, electrostatic capacity,
178
N . H . Bingham
potential and energy may be interpreted in terms of Brownian motion - the equilibrium charge distribution on the boundary ~D of a conducting body D, for instance, is expressible in terms of the first hitting distribution of Brownian motion on OD. A succinct account emphasising the historical aspects is given by Chung (1995); for a textbook account, see e.g., Port and Stone (1978). It was realised in the 1950s, through the work of Doob, Hunt and others, that one can develop a 'potential theory' for Markov processes, Brownian motion being distinguished by having as its potential theory exactly the classical one. Since random walks are particularly simple and important Markov processes, their potential theory has been particularly fully developed accordingly. The theory is seen in its barest essentials in the simplest possible context, random walk in 7/d; the potential theory of such random walks is developed in detail in the classic book of Spitzer (1964). The way in which the language of classical potential theory may be fruitfully generalised is illustrated by the concept of a harmonic function. Classically, a function f is harmonic if it satisfies Laplace's equation A f = 0. This is a linear second-order elliptic partial differential equation; it may be discretised on a lattice in the plane, say - as
[f(x + 1,y) + f ( x -
1,y) + f ( x , y + 1) + f ( x , y -
1)]-f(x,y) = 0 .
With p(x, y) the transition kernel, one may write this more concisely as
Zp(x,y)f(y)=
Y
f(x),
or
Pf = f .
In this form, P may be generalised to the transition function of an ordinary random walk, not just simple random walk on Z 2 as in the example above. One calls functions f harmonic if they satisfy P f = f . The basic transience-recurrence dichotomy depends on the existence of a Green function,
oo
c(x,y)
:=
n=0
(or f o p(t, x, y)dt in continuous time). Random walks for which a Green function exists, that is, for which this sum or integral converges, are transient; when the Green function does not exist (is identically +ec) the walk is recurrent. Transient potential theory involves study of the Green function G(x,y) (as in Spitzer (1964), VI); recurrent potential theory involves instead the Green kernels Gn(x,y) := ~ = o p r ( x , Y ) (as in Spitzer (1964), III, VII).
Discrete Laplacian. We saw above that P f = f serves as a discrete analogue of Laplace's equation Af = 0. This motivates the definition of A :=P-I
as the discrete Laplacian. See e.g., Woess (1994), Section 4B, Biggs (1997), Sections 9, 10 and the references cited there for background; we return to this in Part II below.
179
8. Coup~ng
While the Fourier and potential-theoretic aspects of random-walk methodology are quite classical, the coupling method is more recent. Suppose that the probability measures of two stochastic processes (in discrete time, say) are to be compared. The coupling method is to construct both processes on the same probability space, with the given measures as their distributions, and seek to compare the measures by comparing the processes themselves directly, that is, pathwise. The method originates with Doeblin (1938), in the context of Markov chains. It was developed by Ornstein (1969) for random walks (and so appears in the second (1976) edition of Spitzer (1964), but not the first). A monograph treatment of coupling generally is given by Lindvall (1992); see especially II.3, III.2.12 there for random walks. One of the most successful applications of coupling is to proving the convergence theorem for (ergodic) Markov chains: one starts two independent copies of the chain, one in an arbitrary (or the given) starting distribution, the other in the stationary distribution, and runs them till they meet. One can then consider them as having 'coalesced' (because of the Markov property), and the convergence theorem follows rapidly from this. Another success of the coupling method is its use in proving renewal theorems (I.9 below; cf. Lindvall (1992), II.1, III.1, V.5). Many of the results of Part II below on random walks in more general contexts such as finite groups, for example are naturally proved by coupling methods; see e.g. Aldous (1983), Diaconis (1988), 4E.
9. Renewal theory
Consider first the classical setting involving replacement of components, say, lightbulbs. At time 0, a new lightbulb is fitted, and used non-stop until it fails, when it is replaced; this replacement bulb is used non-stop till failure and then replaced, etc., With X~- the lifetimes of successive bulbs, S, := ~i~_1Xi,
Nt := max{n: S, < t} , Nt is the number of failures up to and including time t. Then N := (Nt : t > 0) is called the renewal process. The lightbulb in use at time t is the (Nt + 1)th (as SN, < t < SN,+1); the mean of this is the renewal function, U(t) := E(Nt + 1) = ~-~(n + 1)P(Nt : n) = Z F n * ( t ) ,
n-O 0
with F the distribution function of the lightbulb lifetimes. Here, as lifetimes are non-negative, the random walk (S~) is concentrated on the half-line [0, ec). More generally, one may consider random walks (S~) on the line; with P the probability measure of the step-length,
O<3
U :=
~pn,
n=O
180
N. H. Bingham
is the renewal measure. Its study - in particular, of its asymptotic properties - is called renewal theory on the line. Similarly in more general settings such as groups: ' o pn* renewal theory is the study of the asymptotics of the Green kernel G := V z_~n=0 The basic result is Blackwell's renewal theorem (Blackwell, 1953): if F has mean/~ E (0, ec] and is non-arithmetic (or non-lattice: the support o f F is not an arithmetic progression),
U(t + h) - U(t)
(t
+oo), -oo)
(The arithmetic case is similar but simpler. This is the Erd6s-Feller-Pollard theorem; see Feller (1968), XIII.3.) Many proofs are known: see e.g., Feller (1971), XI. 1 (renewal equation and direct Riemann integrability), Lindvall (1992), III.1 (coupling), Bingham (1989) (Wiener Tauberian theory). The renewal theorem extends to ~2 (Chung (1952): here the limit is 0 for all approaches to ec. The same is true for Nd (d _> 2): see e.g., Nagaev (1979).
Renewal theory on groups. Random walks on groups and other algebraic structures are considered in Part II below; we pause here to discuss briefly renewal theory in such settings. The renewal theorem extends to locally compact abelian groups G (Port and Stone, 1969). Call G of type H if it is capable of supporting a random walk, with step-length law #, whose renewal measure v , oo #n, does not tend to zero at infinity, type I otherwise. Thus by the 12 : = Z-.~n=0 results above, ~ and 2 d are of type II for d = 1 (by the non-arithmetic and arithmetic cases of Blackwell's theorem), type ! for d > 2. The general result for locally compact abelian groups G is that G is of type II iff G is isomorphic to K or Z K with K compact. The renewal theorem extends also to non-abelian groups. Recall that a group is unimodular if the left-invariant (Haar) measure is also right-invariant; it is amenable if it possesses an invariant mean, on the space of bounded uniformly continuous functions. The term amenable is understood in English to carry both its ordinary connotation and that of 'meanable' (French, moyennable, German, mittelbar); for background, see e.g., Greenleaf (1969), Eymard (1972), Bondar and Milnes (1981). The non-amenable case is always type ! (Derriennic and Guivarc'h (1973)). In the unimodular case, type II groups are of the form K x E with K compact and E isomorphic to ~ or Z (Sunyach (1981)). The amenable, non-unimodular case is studied in detail by Elie (1982a) (especially p. 260 and Sections 1.6, 3.14, 3.22, 5.1).
10. Limit theorems and Brownian motion
Much of the core of classical probability theory is concerned with the limit theory of a sequence of partial sums Sn of random variables Xn (independent and identically distributed, in the simplest case). For example, the trilogy of classical limit
181
theorems - the (strong) law of large numbers, the central limit theorem and the law of the iterated logarithm - concerns just this. Since Sn is a random walk, all of this is random-walk theory in some sense. For our purposes, however, we prefer to regard this material - the classical limit theorems and the central limit problem - as part of general probability theory, and refer to the excellent textbook treatments in the classic texts of Feller (1971) and Chung (1968). We focus here more on the aspects specific to random-walk theory the recurrence/transience dichotomy, and the specifically stochastic-process aspects. Of course, in limit theory one is concerned with Sn as n --, oc. As the number n of steps increases the influence of each individual step decreases, and in the limit it is lost altogether. One thus expects the setting of a random walk to go over on passage to the limit to the setting of a stochastic process in continuous time and state-space, and this is indeed true. In the simplest case when the step-lengths have finite variance, the limiting process obtained is Brownian motion or the Wiener process. The mathematics necessary to handle the passage to the limit is the theory of weak convergence of probability measures, specifically the ErdO's-KacDonsker invariance principle; for an excellent account, see Billingsley (1968). The continuous framework of Brownian motion or some other limiting process, a stable process, or a diffusion, for instance, lurks behind much of the discrete framework of random walks. Instead of obtaining a continuous-time or continuous-state process from a random walk by a limiting procedure, one may instead begin in a continuous time and state-space framework. In this setting, the analogue of a random walk is a Ldvy process - a stochastic process with stationary independent increments. For a recent monograph account, see Bertoin (1996). This book is particularly noteworthy for its treatment of the potential-theoretic (Ch. II) and fluctuationtheoretic (Ch. VI) aspects; cf. 1.7 above and Part III below.
11. Conditioned random walks
It frequently happens that one needs to deal with a random walk, or other stochastic process, in which one wants to condition on some event or other having happened (or, more often, not having happened) by time n. This idea has been current at least since the work of Dwass and Karlin (1963). In limit theorems, one expects to obtain as limit process a process obtained from a familiar one by some conditioning operation. For example, the Brownian bridge is obtained from Brownian motion by conditioning on return to the origin at time t = 1 (Billingsley, 1968), the three-dimensional Bessel process is obtained by conditioning Brownian motion to stay positive, and further processes such as Brownian excursion, Brownian meander and their relatives are obtainable from Brownian motion by conditioning operations of various kinds. Asmussen (1982) gives conditioned limit theorems for random walks, with various applications to queues and risk theory (we defer consideration of such applied topics to Part III). For further results, background, and references, we refer to Bertoin and Doney (1994, 1996).
I82
N. H. Bingham
It may be that the event on which one is conditioning has small probability, vanishing in the limit. For example, one may have a random walk which drifts to - o o , conditioned to stay positive. In such situations, one is visibly focussing on highly atypical behaviour, and the appropriate theory for handling such cases is that of large deviations, for background on which we refer to, e.g., Deuschel and Stroock (1989), Dembo and Zeitouni (1993). In the random-walk or risk-theoretic context, the basic technique is to pass to the associated random walk, a technique originating with Cram~r. See e.g., Feller (1971), XII.4 for theory, Asmussen (1982) for applications.
Part II. R a n d o m walks in more general contexts

1. Random walks and electrical networks
We have seen (I.7) that electromagnetism, in particular, electrostatics in continuous media, is relevant to random walks, via potential theory. The theory of current electricity in networks of wires is also relevant, an observation due to Nash-Williams (1959). This viewpoint has been given an excellent textbook treatment by Doyle and Snell (1984); their book is largely motivated by an attempt to 'explain' Pdlya's theorem. Suppose we have a network of conducting wires, joining nodes x , y , . . . . Write Rx,y for the resistance of the wire xy from node x to node y (Rx,y := +oc if there is no such edge). Note that Rx,y = Ry,x, which reflects the time-reversibility of the physics of steady electrical currents. Write
Cx,y := 1/Rx,y
for the conductance of the wire,
C := ~--~. C,y,
Y
Px,y : = C x , y / C x
Then one can define a stationary Markov transition matrix P = (P~,y), and this is reversible, since
C~Px,y = C~,y = G,x = GPy,x

(see Kelly (1979) for background on reversibility, and II.2 below). If
x x,y
the chain is ergodic with stationary distribution
(C < +oo for finite networks; for a full treatment of infinite networks, we refer to Zemanian (1992)). Conversely, reversible chains can arise in this way from
183
networks: reversibility characterises those ergodic chains that arise from electrical networks. The link between random walks and electrical networks is developed in detail in the book by Doyle and Snell (1984). Key results include the following. (i) Thomson's Principle (or Kelvin's Principle): the flow of electricity through a network minimises the energy dissipation. (ii) Rayleigh's Monotonicity Law: increasing resistances can only increase effective resistance (that is, decrease currents). Used in combination, one can combine short-circuiting (removal of resistance between nodes) and cutting (removal of connection between nodes) to introduce powerful comparison methods between electrical networks, and hence, and this is the point of the method, between the analogous random walks. For example, Doyle and Snell (1984), Section 5.9, exploit Rayleigh's method to give an electrical p r o o f of Pdlya's Theorem (short-circuiting shows recurrence in the plane, cutting shows transience in space). They also consider (Ch. 6) random walks on trees, to which we return in II.5 below. The energy ideas of the electrical analogy have been used by T. Lyons (1983) to give a general recurrence/transience criterion for reversible Markov chains. His method has interesting connections with Riemann surfaces and Riemannian manifolds, and with Brownian motion on them; see II.4 below. The electrical network method has recently been used by a number of authors to simplify and extend various results on random walks on graphs (see II.2 below) and networks. See for example Telcs (1989), Tetali (1991, 1994), Palacios (1993, 1994) and the references cited there.
2. Random walk on graphs
A graph G = (V, E) is a pair consisting of a set V of vertices, and a set E of edges joining some pairs of vertices. For each v E V, we may consider the set Nx of neighbours of x vertices y with an edge joining x to y. One may define simple random walk on the graph G by specifying that starting at x, the particle moves to each neighbour y E Nx with equal probability 1/]Nx], successive steps of the walk being independent. More general probability assignments are possible: here the step from x to y c N~ has probability Px,y (here y = x is allowed, representing either a loop from x to itself or the possibility of staying put, and we can extend to all y by putting px,y = 0 if y is not a neighbour of x). Under mild conditions (such as irreducibility all states being accessible from each other aperiodicity absence of cycling - and recurrence - some, hence all, states recurrent rather than transient), there exists a limiting, or stationary, distribution ~z = (~x)xcV. This has the property that if the system is started in re, so
P ( s 0 = x) = (x v) ,
184
N. H. Bingham
it remains in 7c after one step:

xEV
hence by induction for any number of steps. One may interpret ~zxpx,yas the 'probability flux' from x to y. If the flux along this edge is the same in the reverse direction,
rCxpx,y = ~ypy,x (x,y E V) ,

then the walk is called reversible: its stochastic behaviour is the same if the direction of time is reversed (this reversibility condition is also called the detailed balance condition). A monograph treatment of reversibility is given by Kelly (1979), in the more general context of Markov chains. As noted in II.1 above, reversible random walks on graphs are exactly those for which an interpretation in terms of electrical networks is available.
Randomised algorithms. The general area of random walks on graphs and reversible Markov chains has recently become highly topical. There is an extensive recent monograph treatment by Aldous and Fill (1998 + ). Part of the motivation comes from theoretical computer science, in particular the theory of randomised algorithms. In many situations, algorithms to perform certain complicated tasks numerically involve conditional statements ('if... then ..., otherwise ...'), so that it is not clear in advance how many steps, or iterations, the program will take. An analysis of the computational complexity may deal with the worst case, but this is quite unrepresentative: it is usually better to focus on an average case, in some sense. The ability to analyse, and simulate efficiently, random walks on graphs representing the flow diagrams of, possibly, very complicated algorithms, and in particular their stationary distributions, is thus a valuable aid to assessing the computational complexity of many problems. Also, in many deterministic problems like approximate counting, volume estimation etc., it is much more efficient to use a randomised algorithm rather than a deterministic one. Now, analysis of the computational complexity involves analysis of the convergence rates of the relevant random walks and Markov chains, for which an extensive theory is available. For the algorithmics background, see e.g., Sinclair (1993), Motwani and Raghavan (1995); for surveys of random walks on finite graphs motivated by this, see Lov/tsz (1986), Lovfisz and Winkler (1995). Algebraic aspects. For the purposes of probability theory on graphs, much useful information is provided by the subject of algebraic graph theory, for which see Biggs (1974), and the associated algebraic potential theory (Biggs, 1997). The algebraic potential theory hinges on the discrete Laplacian (cf. 1.7), and the associated Dirichlet problem. From this point of view, the link between random walks and electrical networks is that both can be expressed as Dirichlet problems on a graph. Boundary theory. For random walks on infinite graphs, much of the interest is on behaviour at infinity and boundary theory. For a detailed account of this
185
important subject, see Woess (1991), (1994). Among the other methods relevant here, we mention isoperimetric inequalities. It would take us too far afield to go into details; we refer for background to Dodziuk (1984), Dodziuk and Kendall (1986), Gerl (1988), Woess (1994) Section 3A. Spectral methods. The classical method for studying the rate of convergence of a random walk on a finite Markov chain (or, here, a finite graph) is to use spectral theory. If the transition matrix of the chain is P, the n-step transition matrix is pn. If P is diagonalised by finding its eigenvalues 2i (arranged in decreasing order: 1 = 21 _> 22 ~ " ' ' ~ - - 1 , where 21 = 1 by the Perron-Frobenius theorem) and eigenvectors, and we form the diagonal matrix A of the 2i, then P" can be read off in terms of A n, and the convergence behaviour of Pn as n --+ ec is determined by 22, the second largest eigenvalue. For recent accounts of this spectral approach to the rate of convergence of a Markov chain, see Diaconis and Stroock (1991) in the reversible case, Fill (1991) in the non-reversible case. Alternatives to the spectral approach are available when the Markov chain, or graph, has special structure, and we turn to such cases in the sections below. Cover times. One problem of particular interest for random walks on graphs is that of cover times - the time it takes for the walk to visit every state. For background, see e.g., Aldous (1989), Broder and Karlin (1989), Kahn ct al. (1989), Zuckerman (1989), Ball et al. (1997).
3. Random walks on groups
If xl, x2, are independently chosen from a distribution # on a group G, then the sequence of products
Sn
: = XnXn- 1 "'" X2Xl
(So : = e)
is called the random walk on G generated by #. If G is abelian, it is customary and convenient to write the group operation additively: Sn:=Xl+'''+Xn (S0:=0) .
Probability theory on groups - in particular, the theory of random walks - and the closely related study of harmonic analysis on groups have been developed principally for the case of locally compact groups G, to which we confine ourselves here. The (locally compact) abelian - l c a - case is treated in detail in Rudin (1962); the compact case, using the Peter Weyl theory of group representations, in Vilenkin (1968), Ch. 1. For the general locally compact case, see Heyer (1977). The link with random walk on graphs is given by the Cayley graph of a group (see e.g., Biggs (1974), Ch. 16). Ehrenfest model. For purposes of illustration, we mention here a classical instance of a random walk on a group, of some historical and physical interest.
186
N. H. Bingham
Let G = 7/2d,the additive group of d-tuples of 0s and 1s modulo 2 (thus 1 + 1 = 0). This is motivated by the famous Ehrenfest urn model of statistical mechanics, where d balls are distributed between two urns, labelled 0, 1 (so the space Z2 d of d-tuples describes the occupancy states of the balls). A move, or step, consists of choosing a ball at random and transferring it to the other urn. This generates a random walk on G = 7/~ (and also a nearest-neighbour random walk on the graph corresponding to 7/2d, the unit cube Cd in d dimensions). As the structure of G is so simple, the behaviour of the walk is straightforward to analyse (Kac (1959), III.11). Now the original way to analyse this model involves counting the number of balls present in one (or the other) urn superficially simpler, as the number of states is thereby reduced from 2 ~ to d + 1. In fact, the analysis is now more complicated (Kac (1959), III.7-10). It involves certain special functions - discrete orthogonal polynomials, the K r a w t c h o u k p o l y n o m i a l s - which arise in the harmonic analysis of the relevant Gelfand pair (III.4 below).
Note. In the statistical mechanics context, d is of the order of magnitude of
Avogadro's number (c. 6 x 1023), so 2 e - t h e number of states, and the recurrence time of the extreme states ( 0 , . . . , 0 ) and ( 1 , . . . , 1 ) is so vast as to make the theoretical recurrence of states with such astronomically large recurrence times unobservable in practice. The importance of the Ehrenfest model is to reconcile the observed irreversibility of systems at macroscopic level with the reversibility of the dynamics describing them at microscopic level. This theme, the question of the 'arrow of time', is of fundamental importance in physics. For background and references, see e.g., Bingham (1991), Bingham (1998), Section 1.11.
Card-shuffling. If a pack of n cards (n = 52 in the case of ordinary playing cards) is shuffled, the objective is to start from the initial distribution, which is highly patterned, reflecting following suit in the play of the previous hand, and end with a patternless or uniform distribution. There are n! permutations of the cards (note that 52! is enormous! - c. 8.05 x 1067). The usual method of shuffling, riffle shuffling, is analysed in detail by Bayer and Diaconis (1992), Diaconis, McGrath and Pitman (1995). Suppose distance between distributions is measured, as usual, by variation distance (or norm):
= I1# vii : = supA{I,(A) .
This exhibits the Aldous Diaconis 'cut-off phenomenon': for /2 a typical initial distribution, #k the distribution after k shuffles, ~z the uniform distribution (the limit distribution of/2 ~ as k ~ oc) d(# ~, 7r) stays close to its maximum value 1 for k small, starts to decrease sharply around the 'cut-off" value k ~ ~log 2 n, and approaches zero geometrically fast for large k. For n = 52 as for actual playing cards, one may summarise this by saying that seven shuffles suffice to get close to uniform. For background, see e.g., Aldous and Diaconis (1986), Diaconis (1988), 3A.2, 4D. As well as a variety of probabilistic techniques, the mathematics
t87
involved is that of group representations and non-commutative Fourier analysis on finite groups.
Note. It is interesting to compare this line of work with a contrasting and more recent one due to Lovfisz and Winkler (1995). Here, one uses a randomised algorithm to achieve exact (rather than approximate) uniformity, but after a random number of steps. For the standard pack of n = 52 playing cards, the expected number of steps to achieve uniformity is c. 11.724. Compact groups. Consider first the case of a compact group G, in particular, a finite group G. Here no question of behaviour 'at infinity' arises. Instead, the basic result here is the It6-Kawata theorem: for a random walk with distribution # on a compact group G, the convolution powers #n converge to (normalised) Haar measure on the closed sub group G~ generated by the supp ort o f # (Heyer (1977), Secti o n 2.1 ). We lose nothing for most purposes by restricting from G to G~ (or, we may restrict to # whose support generates G), when we obtain: on a compact (in particular, finite) group, random walk converges to the uniform distribution. Of course, as we have seen above with the finite case of card-shuffling, interest here focusses in great detail on how fast this convergence takes place (Diaconis (1988); Urban (1997)). Boundary theory. For infinite, discrete groups, one of the main questions is to study how the random walk 'escapes to infinity'. Thus the behaviour of the group itself 'at infinity', growth properties, etc., is crucial. Furthermore, it is usually better to seek an appropriate compactification of the state-space, by adjoining a suitable boundary, so that the behaviour of the walk on this enlarged state-space is more informative, or better behaved, or both. Boundary theory is too vast a subject for us to do more here than point the reader to suitable references for a full account; see e.g., Furstenberg (1971), Kaimanovich and Vershik (1983), Varopoulos et al. (1992), Woess (1994), Sawyer (1997), Kaimanovich (1991). The general theme is that each of the structure of a group G, and the behaviour of random walk on G, is highly informative about the other. Similar remarks apply to the behaviour (especially at infinity) of random walks on graphs; see II.2 above and II.4 below. Particular kinds of group have been studied in greater depth; for the boundary theory of random walks on Fuchsian groups, for example (groups G of M6bius transformations which act discontinuously on some G-invariant disc: see e.g., Beardon (1983), Section 6.2), see Series (1983). Kesten's problem. The most basic question about random walk on groups is the recurrence/transience dichotomy: when is the random walk generated by # on a group G recurrent or transient? (as above, it may be appropriate to restrict to G~ = G). If we recall the Chung-Fuchs criterion of 1.6, we see that the special nature of Z 1 - that it can support a recurrent random walk - is revealed by the symmetric probability laws # (whose mean is zero), and that of 7/2 by the symmetric laws with finite variance, in particular, with compact support. So if we restrict to probability laws # which are symmetric (x and x -1 have the same
188
N. H. Bingham
distribution) and of compact support, Z d can support a recurrent random walk if d = 1 or 2, but not otherwise. Groups G that can support a recurrent random walk generated by a symmetric # of compact support are called recurrent groups; other groups are called transient groups. The question of which groups are recurrent and which are transient has become known as Kesten's problem, in honour of early work by Kesten (1959), (1967), Kesten and Spitzer (1965). Note that we already have the following examples of recurrent groups: finite groups; 77, 772. It turns out that these examples are, in a sense, the prototypes for finitely generated recurrent groups: the only recurrent groups which are finitely generated are {e}, 2, 772 and finite extensions of them (Varopoulos (1983a, b, 1984): see Varopoulos et al. (1992), Ch. VI). The solution depends on the volume growth of a discrete, finitely generated group (defined in terms of the word metric, the graph metric of the Cayley graph: ibid., VI.2, Woess (1994), Section 2C). Then finitely generated groups are recurrent iff their volume function V(k) can have growth of at most O(k 2) (Varopoulos et al. (1992), VI.6). The theory is extended to (locally compact) unimodular, compactly generated groups (prototype: Nd) in Varopoulos et al. (1992), Ch. VII, and to unimodular Lie groups in Varopoulos et al. (1992), Ch. VIII (Kesten's problem for connected Lie groups is solved in Baldi (1981); for background on random walks on Lie groups, see Guivarc'h, Keane and Roynette (1977)).
4. Brownian motion on Riemannian manifolds
The motivating problem for the Varopoulos theory on Kesten's problem for groups was the analogous question for certain Riemannian manifolds, in particular for covering manifolds M1 of a compact manifold M (for background on covering manifolds, see e.g., Chavel (1993), Ch. 4). The deck transformation group F of the normal covering P~ : M~ ~ M is finitely generated and compact, since this holds for the fundamental group ~I(M) of M. Varopoulos' theorem states that Brownian motion on 341 is recurrent if and only if the deck transformation group F is recurrent (Varopoulos et al. (1992), X.3), and we know from the above that this holds if and only if F is a finite extension of {e}, 77 or 22. For background to potential theory on manifolds and graphs (such as the Cayley graph of a finitely generated group, as here), see e.g., Ancona (1990), III, Lyons and Sullivan (1984) and the references cited there, and Biggs (1997). This method of studying manifolds via graphs is called discretization, since the graph serves as a discrete analogue of, or approximation to, the manifold.
Heat kernels. The heat kernel p(t,x,y), the fundamental solution of the heat
equation
(A + a/at)u = 0 ,
with A the Laplacian of the manifold, plays a decisive role in the analysis and potential theory of a manifold, as well as the behaviour of Brownian motion there;
189
for background see Davies (1989) and the references cited there. The discretization procedure above replaces this by its analogue on the graph, also called a heat kernel. This may be studied in continuous time (as on the manifold), or discrete time, when one obtains pn (x, y), the n-step transition probabilities of the random walk on the graph. Recent results on heat kernels on graphs are due to Pang (1993), Davies (1993); we shall return to this question later for graphs with special structure.
5. Random walks on homogeneous spaces, Gelfand pairs, hypergroups and semigroups
1. Homogeneous spaces. Random walks often occur in settings which are not
themselves groups, but in which a group structure is nevertheless present. If G is a group and K a compact subgroup, G acts on the coset space M := G/K, which is a homogeneous space; random walk in such contexts have been surveyed by Elie (1982b), Schott (1984). With G, K suitable Lie groups, M may be given a Riemannian manifold structure; certain M arising in this way (those for which the curvature tensor is invariant under parallel translations) are symmetric spaces in the sense of Elie Caftan. For a monograph treatment of these, including Cartan's classification, see Helgason (1962). A particularly important case is that of the spheres Z~ (a k-dimensional manifold of constant positive curvature in (k + 1)dimensional Euclidean space): these are compact symmetric spaces of rank one, given by
Xk = SO(k + 1)~SO(k)
(Helgason (1962), X.3, Example III). For random walks on Zk, see Bingham (1972).
2. Gelfand pairs. Closely related to this is the concept of a Gelfand pair: pairs (G,K) as above in which the convolution algebra of functions in L1 (G) biinvariant under K is commutative. For random walks in this context, see the survey of Heyer (1983). Prototypical examples of Gelfand pairs include (G,K) = (SO(k+ 1), SO(k)) above, relevant to random walk on spheres, in the infinite (or continuous) case, (G,K) = (Y_~x Sd, Sd) (with Sd the symmetric group on d objects), giving the unit d-cube relevant to the Ehrenfest urn model in the finite case (Diaconis (1988), 3F, Remark 3). Gelfand pairs provide the machinery needed to lift a random walk on a Markov chain to a random walk on a group. For background, see Letac (1981), (1982), Diaconis (1988), 3F, 3G and the many references cited there, Heyer (1983).
3. Hypergroups. If 6x denotes the Dirac measure at x for x in a group G, the convolution is given by cSx* 6y = 6x.y. The convolution 6x * 6y can usefully be defined on some structures other than groups, called hypergroups, and here also
one can study random walks. For full detail on this important subject, we refer to the survey of Heyer (1984) and the monograph of Bloom and Heyer (1994).
190
N. H. Bingham
4. Semigroups. Here one has no inverse operation as in a group, only the product operation; typical examples are the reals under multiplication, and matrices under matrix multiplication. The resulting structures are of course less rich than their group-theoretic counterparts, but nevertheless a theory of random walks on semigroups -including, in particular, questions of recurrence and transience- has been developed in some detail. For background, see e.g., H6gn~s (1974), Mukherjea and Tserpes (1976).
6. Random walk on graphs with special structure
1. Graphs with symmetry properties. The d-cube Z~ considered earlier in connection with the Ehrenfest model is a good example of a graph with a high degree of symmetry. Other examples include:
(i) the Platonic solids (classical regular polyhedra: tetrahedron, cube, octahedron, dodecahedron, icosahedron), (ii) the Archimedean solids (semi-regular polyhedra) - prisms, antiprisms, and thirteen others, including the truncated icosahedron/soccer ball, which has recently achieved fame as the model for the C60 or buckminsterfullerene molecule (a new form of carbon), (iii) higher dimensional polytopes (see e.g., Coxeter (1973), VII). Properties of random walks on regular graphs, polyhedra and polytopes have been studied in depth in a series of works by Letac and Takfics (1980a, b), Tak~cs (1981-84, 1986). For background on the implications of symmetry and regularity properties of graphs, see Biggs (1974), Part Three. Random walks on highly symmetrical graphs have been studied by Devroye and Sbihi (1990), and on edgetransitive graphs by Palacios and Renum (1998). See also Belsley (1998) for rates of convergence.
2. Pre-fractals. Many of the classical examples of fractal sets are nested fractals, obtained by some recursive construction and exhibiting some self-similarity property. The fractal is obtained by a limiting procedure; the graph obtained by terminating the recursive construction after finitely many steps is a pre-fractal. An important example is the Sierpinski gasket, a fractal obtained by starting from an equilateral triangle and recursively removing the opposite-pointing triangle forming its 'middle quarter'. The corresponding pre-fractal, the Sierpinski pre-gasket or Sierpinski graph, is illustrated in Falconer (1985), Fig. 8.4 and Section 8.4. Random walk on the Sierpinski graph is considered in detail by Grabner (1997), Grabner and Woess (1997). There are interesting near-constancy phenomena, and connections with branching processes; for background, see e.g., Biggins and Bingham (1991, 1993). Jones (1996) obtains bounds for heat kernels on the Sierpinski graph, which, because of the special structure, are better than those available for more general graphs (Pang (1993), Davies (1993)). See also Hattori et al. (1990), Hattori and Hattori (1991).
191
3. Trees. Trees, graphs without circuits, are simpler to handle than general graphs,
see Gerl and Woess (1986). Doyle and Snell (1984), Ch. 6 use random walk on trees and Rayleigh's comparison method to give a new proof, intended as an 'explanation' of P61ya's theorem. The most interesting case is that of an infinite tree. If we have a transient random walk on an infinite tree, attention focusses on how the walk 'escapes to infinity', hence on compactifications of the state-space. For background and references, see the survey of Woess (1991) and the papers on trees cited there. The key parameter for random walk on an infinite but locally finite tree is the mean number of branches per vertex. This can be identified with the exponent of the Hausdorff dimension of the boundary (R. Lyons, 1990). One can introduce a one-parameter family of random walks on such trees, where the tendency to transience (escape to infinity) may be balanced by a greater probability of choosing the branch back towards the root. By varying the parameter, a phase transition is obtained. For a full account, see R. Lyons (1990, 1992), Lyons and Pemantle (1992), the monograph Lyons and Peres (1998 +), and Takfics (1998).
4. Crystallographic lattices. The lattice 22 gives a (recurrent), tiling of the plane
by squares; one can also tile the plane by equilateral triangles or by hexagons. That random walk on the triangular and hexagonal lattices is also recurrent was shown by Rogers (1985). Suppose one takes the square lattice 7/2, and replaces each lattice point by a small square whose diagonals form part of the grid-lines of the lattice. The resulting tiling of the plane by small squares and larger octagons is familiar from patterns on wallpaper and linoleum (usually rotated through re/4 for aesthetic reasons). This lattice too is recurrent; Rogers (1985) shows this, and gives a general method in terms of additive functionals of Markov chains. Results of this type are also given by Soardi (1990). One may apply the method to other familiar lattices, such as the crystallographic lattices in 2 and 3 dimensions. These may be classified: in 2 dimensions, there are 17 such 'wallpaper groups', most of them represented on the famous wall decorations of the Alhambra in Toledo (Schwarzenberger, 1974). In three dimensions, there are over two hundred (there are two ways to count them, depending on whether or not laevo and dextro forms are distinguished). For background on these and crystallographic classification in higher dimensions, see Schwarzenberger (1980).
7. Variants on random walk
A number of variants and generalizations of random walk have been studied, for mathematical interest or to model aspects of natural phenomena.
1. Random walk in random environments. Considering random walk on Z (for simplicity), suppose that each integer n is chosen independently to be one of two types; with these choices made, 7/is now a 'random environment'. Now suppose
192
N. H. Bingham
that a particle performs a random walk on ~, but with different transition probabilities from sites of the two types. This is a random walk in a random environment (RWRE). The motivation comes partly from the physics of random (or disordered) media. For details and references, we refer to R6vbsz (1990), Part III, Lyons and Pemantle (1992), Pemantle and Peres (1995).
2. Reinforced random walk. Suppose that a particle performs a (nearest-neighbour) random walk, not choosing all neighbours with the same probability but showing a preference for sites already visited. This model, called reinforced random walk, is motivated by the habits, and learning behaviour, of humans and animals: one deepens one's knowledge of the known environment by re-visiting it, occasionally extending it by forays into the unknown. For background, see e.g., Pemantle (1988), Davis (1989). 3. Self-avoiding random walk. Suppose that a random walk evolves, but that the walk is not allowed to revisit states previously visited. The resulting process, called a self-avoiding random walk, models the behaviour of polymers and the like. The model is difficult to analyse, as the excluded volume restriction makes the current evolution of the path strongly dependent on the entire history of the path to date. For detailed accounts, see e.g., Barber and Ninham (1970), Madras and Slade (1993), Hughes (1995), Ch. 7. Self-avoiding walk on the Sierpinski graph and gasket has been considered by Hattori et al. (1990), Hattori and Hattori (1991), motivated by physical applications. 4. Branching random walk. Branching processes model the reproductive behaviour of biological organisms; random walks may be used to model the spatial diffusion of such organisms. The two may be combined: in the branching random walk, particles perform random walk for some lifetime distribution (exponential, say); on death, they are replaced by the next generation, as in the usual branchingprocess model, each of whom performs a new random walk independently, starting from its birthplace. The resulting model, an idealization of a biological population evolving in time and space, has been analysed in considerable detail; see e.g., Biggins (1995) and the references cited there. We shall deal in Part III with the Lindley equation in connection with queueing; for related results for branching random walks, see Biggins (1998). 5. Tree models in mathematicalfinance. In probability theory one generally writes random walks additively, when the relevant group is abelian. In mathematical finance, however, one naturally thinks in terms of financial returns, gains per unit of capital invested, and it is now more natural to work with multiplicative random walks. The analogue of a random walk on 7/taking steps 1 with probabilities p, q is now a binomial tree. The model is due to Cox, Ross and Rubinstein (1979), who used it to derive the discrete Black-Scholes formula (of which the Black-Scholes formula itself is a limiting case). For details, see e.g., Bingham and Kiesel (1998), Section 4.5.
193
Part III. Fluctuation theory

1. Spitzer's identity
If we think of a random walk (Sn) as modelling the capital of a player in a gambling game, for instance, the monetary interpretation means that the large (or small) values of Sn are of particular interest. For many purposes, both theoretical and practical, it is useful to focus attention on these explicitly, by considering the sequence M~ := max{0,S1,... ,S~} of maximal partial sums to date (or, dually, of m, := min{0,S1,... ,S,}) . The study of (Mn) and related quantities is referred to as the fluctuation theory of the random walk (the terminology is that of Feller (1949, 1968), III). Passing from (0, $ 1 , . . . , Sn) - equivalently, from (X1,... ,Xn) - to (M1,..., M,) effects a useful data reduction, as the sequence of maxima will typically contain fewer (perhaps many fewer) distinct elements. On the other hand, the random walk (S~) is Markovian, while the maximum sequence (M~) is non-Markovian, which makes it much harder to analyse. The key result is the following, which links the distributions of M~ with those of S+ := max(0,Sn). For 0 < r < 1, N2 _< 0, one has
Zr~Eexp{2M~} = exp
n=0 1. n=l
nEexp{;~S~ }
(Spitzer (1956): the result is called Spitzer's identity). Write f ( 2 ) := Eexp{2X1} for the characteristic function of the step-length distribution (of course, this is usually defined as Eexp{itX1} for t real, continue t to a complex variable, as we do, the i serves no defined for ~2 = 0, a line in the complex 2-plane, and may strip (which may degenerate to the line ~2 = 0 above), or whole plane. For 0 < r < 1, write co+()o) := exp
L.n=l
of the random walk but if we intend to purpose). Thus f is be continued into a a half-plane, or the
E{exp 2S,,}I(Sn > 0
co~-(2) := exp
I,n=l
E{exp)oS~}I(S. < 0
194
N. H. Bingham
Then %+ (2) is defined at least for ~2 _< 0, co;-(2) at least for ~2 _> 0, and are analytic in the respective open half-planes. On the intersection N2 = 0 of the two closed half-planes, where both are defined, one has co~+(2)co2(2) = exp
Eexp{2S~
[.=I
= 1/(1 - 4 ( 2 ) ) .
This idea of taking a function defined only on a line (or strip) in the complex plane, and expressing it as a product of two functions each defined in complementary half-planes intersecting in this line and analytic in their interiors is often useful, as it allows the powerful machinery of complex analysis to be brought to bear. It may be traced to the work of Wiener and H o p f (1931) (cf. Paley and Wiener (1934), IV), and is accordingly known as the Wiener-Hopf method; for a survey of Wiener-Hopf methods in probability, see Bingham (1980). The factors o +, co;- are called the right and left Wiener-Hopffactors of the random walk. We may re-write Spitzer's identity as
Z r~Eexp{2Mn} =
n=0
cot +(2)cot (o) = exp / ~-~ ~ E e x p { 2 s +} } ,

I. n=l
and there is a bivariate extension

oo
Z r~Eexp{2Mn + #(Sn - Mn)}

n--0
= co+(2)co;(~)
(o < r < 1, ~ 2 _< o, ~ u > 0) ,
also called Spitzer's identity or the (first)factorization identity. Both are due to Spitzer (1956); for an excellent textbook treatment, see Chung (1974), Ch. 8.
Order statistics. Spitzer's identity deals with the maximum, largest order statistic, of the partial sums. It extends to other order statistics: see Wendel (1960), de Smit (1973a). Generalisations. Factorisation identitities of this type can be proved in much more general contexts, such as Markov chains, Markov additive processes, etc., and there is now a considerable theory. For background and details, see e.g., Arjas and Speed (1973), Asmussen (1989), Kennedy (1998).
195
2. Ladder epochs and heights

Particular interest attaches to those partial sums which are maximal, and those members of the sequence (Sn) which belong to the sequence (M~) also. The zeroth partial sum is So := 0; the first positive partial sum, Z say, is called the first strict ascending ladder height; the first time T that this level is attained is called the first strict ascending ladder epoch. Thus T, Z are the time and place of first passage of the r a n d o m walk to (0, ~ ) . Considering the first non-negative partial sum Sn with n >_ 1 gives the weak ascending ladder height and epoch (first passage to [0, oc)), and similarly for strong and weak descending ladder heights and epochs. Subsequent ladder epochs and heights may be defined by starting the process afresh at the first ladder epoch. The ladder steps are the gaps between successive ladder heights. O f course, ladder variables may be defective: if the walk never enters (0, oc), one defines T, Z to be +oc, and similarly for the other types of ladder variable. To proceed, one needs to classify by defectiveness or otherwise of the ladder variables. Discarding the trivial degeneracy of the step-length distribution F being concentrated at zero, one has exactly one of the following three alternatives (Feller (1971), XII, XVIII): (i) drift to +oc: Sn ~ + c o a.s. (so m := min{S~ : n >_ 0} > - o c a.s.), (ii) drift to - o c : S~ --+ - o c a.s. (so M := max{Sn : n _> 0} < + ~ a.s.), (iii) oscillation: limsupS~ = +oc, liminfSn = - e c a.s. (so m = - e c , M = +oo
a.s.).
This drift/oscillation trichotomy is decided by 1
B
~1
< 0) :
one has (i) drift to +oc iff A = oc, B < ec, (ii) drift to - e c iff A < eo, B = oo, (iii) oscillation iff A = B = ec (in fact ~ Z 1 ~P(Sn = 0) < oc always, so one could use P(Sn <_0), P(S, >_O) instead here). [Of course, if the mean step-length # exists, the strong law of large numbers shows that we have drift to + o c if # > 0, drift to - o c if # < 0 and oscillation if # = 0, but matters are not so simple if # is not defined.] Write L~ := min{k : k = 0, 1 , . . . , n : Sk = Mn} for the first time up to time n that the maximum it attained, L'n := m a x { k : k = 0, 1 , . . . , n : Sk = m~}
196
N. H. Bingham
for the last time that the minimum is attained. Then L~ is the last occurrence of a strict ascending ladder-point, and dually L'~ is the last occurrence of a weak descending one. Write T', U for the first weak descending ladder epoch and height. To solve the first-passage problem into the positive half-line (0, oc), one requires the joint law of (T, Z). It turns out that this is expressible in terms of the Wiener-Hopf factor co+, and that of (T', Z') in terms of co-. One has the SpitzerBaxter identity (or second factorization identity): co+(2) = 1/(1 - E(r r exp{2Z})) =
_
r~
n =n }
exp{2Sn}dP
(0<r<
1, N2_<0)
and its dual form

co;() = 1/(1 - E(r T' exp{/iZ'}))
= _
r~ .=0}
exp{pSn}dP
(O<r<
1, ~/~_> O)
(Spitzer (1960), Baxter (1958); cf. Port (1963), Chung (1974), Ch. 8). The number of positive partial sums
n
N. :=
k-1
I(S
> 0)
(occupation-time of the half-line (0, ~ ) ) is often important. For each n, the distributions of N, and L~ coincide. Indeed, the laws of (N~,S,), (L~,S,) and ( n - L ' n , S ~ ) coincide, and similarly with Sn replaced by any function of (X~,..., X~) invariant under permutations of the X~. This important result is called (E. Sparre) Andersen's Equivalence Principle (Andersen (1953/54); Chung (1974), Ch. 8).
Extremalfactorization. The equivalence principle has many useful consequences. For example, P(Ln = k) = P(Lk = k)P(Ln-k = O)
follows easily from the Markov property of the random walk. This translates into
P(N, = k) = P(Nk = k)P(Nn-k = O) ,
an important but non-obvious property called extremalfactorization (Port (1963), Heyde (1969)).
Note. The (ascending) ladder heights and epochs are also the (upper) records and record times of the partial-sum process. The term 'record' in the statistical liter-
197
ature usually denotes a record of the readings X~ rather than their partial sums. We shall use such records in III.9 below; for background, see e,g., Foster and Stuart (1954), Bingham et al. (1987), Section 8.14.
3. Spitzer's arc-sine law

Recall (I.1) the Chung-Feller theorem, giving the exact distribution of the time spent positive in simple random walk on ~ (the discrete arc-sine law), and its limit distribution, the (continuous) arc-sine law. It turns out that the results above allow a definitive generalization of this result. First, we note that the discrete arc-sine law for the occupation-time of a random walk holds, not only for the case of simple random walk on ~ (where, recall, we had to take care over what we meant by the walk being 'positive'), but also whenever the step-length distribution F is symmetric and continuous. For this, see Feller (1971), XII.8. Note that this result has the remarkable property of being distribution-free: it does not depend on F, provided only that F is continuous and symmetric (see III.8 below). For 0 < p < 1, consider the probability distribution Go on [0, 1] with density
go(x) . - sin rCPxp_1(1 - x) -p

7Z
(to see that this is a probability density, use F(z)F(1 - z ) = re/sin ~zz and Euler's beta-integral). For p = 0, the density is singular at x = 0; consideration of Laplace transforms shows that Go~0 (p~0)
(weak convergence to the Dirac law at zero); similarly
GR--~(~I ( p T 1 )
Defining Go := ~0 and G1 := ~1, one thus has a family of laws {G o : 0 < p < 1} on [0, 1] - the generalized arc-sine laws with parameter p (some authors use the alternative parametrization 1 - p), or with mean p: if X has law Go,
EX=p
and its kth moment is given by
E(Xk)=(kp-l)
= (_)k (-k p)
(k = O, 1,...)
For proof and background, see Dynkin (1961), Lamperti (1962), or e.g., Bingham et al. (1987), Section 8.6.2. We may now formulate Spitzer's arc-sine law (Spitzer, 1956): the fraction of time N~/n that the random walk spends positive up to time n has a limit distribution as n --+ oc iff
198
//
N. H. Bingham
then the limit law is the generalized arc-sine law Go, and these are the only possible limit laws. The condition above is called Spitzer's condition; it was proved recently by Doney (1995) that this is equivalent to the apparently stronger condition
P ( S . > o) --, p (n ~ oo) .
Thus the following conditions are equivalent. (i) Convergence of probabilities: P(Sn > O) --+ p. (ii) Convergence of means: { ~ = 1 P ( & > O) -+ p = f~ xdGo(x). (iii) Convergence of moments: E[(Nn/n) k] ~ (-)k(~ o) = yd xkdGo(x) (k = 0 , 1 , . . . ) . (iv) Convergence in distribution: Nn/n ~ GO in distribution. Furthermore, these are the only possible limit distributions. One can extend the list above to include an even stronger statement, weak convergence of the Markov processes measuring the time-lapse since the last ladder epoch (Bingham, 1973). Spitzer's condition holds when the random walk belongs to the domain of attraction (without centring) of some stable law, H say: if
& / a o -* H (n ~ o~) ,
then P(Sn > O) = P(Sn/a~ > O) -+ 1 - H(O) = P ( Y > O) , where Y is a random variable with the stable law H. If Y (or H) has index c~ E (0,2] and skewness parameter fi E [-1, 1], one has
_
1 arctan titan
(')
7r~
(Zolotarev (1957); cf. Bingham et al. (1987), Section 8.9.2). There are partial results in the converse direction: Spitzer's condition implies a domain-of-attraction condition 'far from symmetry', but not 'close to symmetry'. For details and references, see Bingham et al. (1987), Section 8.9.2. Spitzer's condition is equivalent to regular variation of the tail of the ladder epoch T (that is, for T to be in the domain of attraction of a one-sided stable law). The condition for regular variation of the tail of the ladder height Z is Sinai's condtion: 1 n~=lnP(2<S,<_2x)---+31082 (x--+oc) V2> 1
(Sinai (1957); Bingham et al. (1987), Section 8.9.4).

4. Ballot theorems
199
Suppose that two candidates, A and B, compete in a ballot, their finals scores being a and b votes respectively (a > b). The probability that the winning candidate A is ahead throughout the count is a/(a + b). This classic result, the ballot theorem, stems from the work of Desir6 Andr~ in 1887 (the reflection principle of 1.2) and Whitworth's classic book Choice and chance (Whitworth, 1886). For a monograph treatment of the many extensions and applications of the ballot theorem, see Tak/tcs (1967). [Of course, there are implicit exchangeability conditions here: in an actual election - say, for parliament - the lead may fluctuate during the count because of the psephological characteristics of the particular constituency.] A form of the ballot theorem arises in the context of skip-free random walks. Call a walk on the integer lattice 77 left-continuous, or skip-free to the left, if the step-length law F is supported on { - 1,0, 1,2,...} - thus the walk can jump to the right, but moves to the left continuously (which on the lattice means one step at a time). Right-continuous random walks are defined analogously. If T is the firstpassage time from 0 to - k in a left-continuous walk (k = 1,2,...), one has
Kemperman's identity P(Tk=n)=(k/n)P(S,=-k)

(1 < k < n )
(Kemperman (1961); for a simple proof, see Wendel (1975)). Kemperman's identity allows one to prove quite simply that for skip-free random walks, Spitzer's condition is equivalent to a domain-or-attraction condition. Of course, this result is to be expected: we noted above that this equivalence holds 'far from symmetry', and skip-free random walks are 'completely asymmetrical'.
5. Queues
The fluctuation theory of random walks developed above is immediately applicable to queues. We consider first the GI/G/1 queue (Kendall's notation: GI for general input, G for general service-time, 1 for the single server, Kendall (1951)); we follow Feller (1971), VI.9, Grimmett and Stirzaker (1992), Ch. 11, or Asmussen (1987), III.7, VII, IX. Suppose customers (labelled 0, 1 , 2 , . . . ) arrive for service at a single-server (free at time 0) at times 0,A1, A1 + A2,... (so the An are the inter-arrival times). Let Bn+l be the service-time of the nth customer. We assume the An are i.i.d, with law A(.), the B~ i.i.d, with law B(.), and the A, and Bn are mutually independent. Write
Xn:=B,-An
(n = l, 2, . . .) ,
and consider the random-walk Sn := }~'~=1Sk generated by the Xn. Let W~ be the
waiting-time of the nth customer, for the queue to clear and his service to begin (so = 0, and W~ = 0 iff the nth customer is lucky, arrives to find the server free).
Then considering the situations facing the nth and (n + 1)th customers on arrival, we see that
200
N. H. Bingham
~+1 = (G +Bn+l -An+l) + = (~ +X,+l) + ,
the Lindley relation (Lindley, 1952). Write a, b for the means of A, B, both assumed finite, and write p := b/a for the traffic intensity. If p < 1, the mean service demand of a new customer is less than the mean time to the next arrival: it is then plausible, and true, that the queue is stable, settles down to an equilibrium state as time t ~ ~ (or as n ~ ec), irrespective of the initial conditions. If W has the limiting waiting-time distribution, the Lindley relation above suggests the equality in distribution
w= (w+x) + ,
where X has the law of the X~. Writing W(.), F(.) for the distribution functions of W, X, this says
W(x) =
F ( x - y)dW~v)
(x_>0) ,
an integral equation of Wiener-Hopf type. Its solution was analysed in detail by Spitzer (1957), who showed that there is a unique solution W(.) which is a proper probability distribution (W(c~) = 1) iff the traffic intensity p < 1, that is, when the queue is stable. When p > 1, there is no such solution, waiting times tend to + ~ in probability (p > 1), or are unbounded in probability (p = 1), and the queue is unstable. The link between random walks and waiting-times is even stronger: one has equality of distribution between W,,,, the waiting-time of the nth customer, and M~ := max{0, S~,..., S~} (this holds for each n separately, not jointly: observe that the sequence (M~) is increasing, while (Wn) is not). To prove this (as in Feller (1971), VI.9), we note the following. (i) The (strong) descending ladder indices of the random walk (Sn) correspond to the lucky customers who arrive to find the server free. (ii) If In] denotes the index of the last ladder epoch up to time n,
G = G - si. ~
(iii) If X1,... ,An are written in reverse order as X~,...,X[, with partial sums
S;~ = S, - S,-k, and M~ = max{0,S~,..., S~}, then
s . - s~. l = M " .
The result then follows as by symmetry M, and M~ have the same distribution. As n ~ ~ , M, i"M < ~ iff the random walk (S~) drifts to - ~ , that is, EXn = b - a < 0, i.e., b < a, p := b/a ~ 1. We can read off the limiting distribution M of the M, from Spitzer's identity, which contains the distributions of the M~, to get
201
Eexp{itM} = exp
- 1)
(p < 1) .
When means exist, the sign of b - a discriminates between drift to - o c , drift to + e c and oscillation; this tells us again the limiting waiting-time law W exists if p < 1 but not otherwise. For a stable queue, we can consider the number N of customers served in the first busy period (initiated by the arrival of the 0th customer at time 0) or, by the same token, any busy period. Note that the busy period is a.s. finite - and so N < ec a.s. - iff the queue is stable. We can also consider the length T of the first (or any) busy period, and the length I of the first (or any) idle period. It turns out that the ladder variables of III.2 provide the key to this analysis, in view of the following. (i) The number N of customers served in the first busy period is the first weak descending ladder epoch of (Sn). (ii) The length I of the first idle period is given by I = -SN, where SN is the first weak descending ladder height of (Sn). Thus (N,I) may be handled by the ladder methods discussed earlier. In fact, (N, T) may also be handled similarly: see Kingman (1962b, 1966).
Many-server queues. For the queue GI/G/s with s servers, the theory is much more fragmentary when s > 1. Kiefer and Wolfowitz (1955, 1956) show that the appropriate definition of the traffic intensity is now p := b/(as): when p < 1, the queue is stable, the s-vector of the servers' virtual waiting times converges to equilibrium. They also obtain an analogue of Lindley's equation, and several other results. For background on many-server queues, see e.g., de Smit (1973b, 1973c) and the references cited there. A study of higher-dimensional random walks motivated by queueing theory has been given by Cohen (1992). Lindley equations in more general contexts. Lindley equations have been studied in higher dimensions, and in contexts such as branching random walk (II, Section 7.4). For details, see Karpelevich, Kelbert and Suhov (1994), Biggins (1998).
6. Continuous time
We saw above that the fluctuation theory of random walks (S,,) involved the maximum process (Mn), and also ( M n - S~), whose distributions are given in terms of the W i e n e r - H o p f factors of the random walk. In continuous time, the natural analogue of a random walk is a L6vy process X = (Xt), whose distribution is specified by its L6vy exponent, 0(.) say:
Eexp{sXt} = exp{t0(s)}
(~s = 0)
202
N. H. Bingham
(here ~ is given by the L6vy-Khintchine formula in terms of its L6vy measure v, which governs the jumps of the process X: see e.g., Bertoin (1996)). The analogues of (Mn), (Mn - S,) are X, X - X , where 3[ is the supremum process: ){t:=sup{X,:0<s<t} .
It turns out that both Spitzer's identity and the Spitzer-Baxter identity, or first and second factorization identities, have analogues for Lbvy processes. For a L6vy process X with L6vy exponent ~, and a > 0, there is a Wiener Hopf factorization
= ,
where (i) 0 + (s) is analytic in ~s < 0, continuous and non-vanishing in ~s < 0, is the Laplace transform of an infinitely divisible probability law on the right halfline, and gives the distributions of X:
t)+(s) = ~
./o
e-~tEexp{sXt}dt
e~ - 1)
= exp
t-le-~tp(xt C dx)dt
(ii) similarly for ~ (s) in ~ > 0, ~s > 0 and the left half-line, and ~2 (s) gives the distributions o f ) [ - X: 0~-(s) = o-
= exp
/o {/
e-~tEexp{s(Xt - X t ) } d t
0 (esx
OO
1)
/0
t-ie ~tp(xt E dx)dt
For proofs, see Greenwood (1975, 1976), Greenwood and Pitman (1980a), (1980b), and for the applied background, Bingham (1975). The W i e n e r - H o p f factors 4t~ are thus the key to the fluctuation theory of Lbvy processes X; however, given the Lbvy exponent ~ of X it is not in general possible to evaluate the integrals above explicitly. But it is possible to do this in the onesided case: when the L6vy measure, or spectral measure, v is concentrated on one half-line. Such an X is called spectrally positive if v vanishes on ( - o o , 0) (X has no negative jumps), spectrally negative if v vanishes on (0, oc) (X has no positive jumps), see e.g., Bertoin (1996), VII for background. This is indeed fortunate: not only are the Wiener-Hopf factors available in this case (they may be evaluated in terms of the inverse function ~/of 0: see e.g., Bingham (1975), Th. 4a), but it is just this case which is important in practice, as it occurs in the applied probability models of queues and dams, to which we turn below.
Splitting times. The last 'ladder epoch',

Pt := sup{s _< t : Xs -- Xs} ,
203
the last time the supremum to date was attained, is an example of what is called a splitting time. It is far from a stopping time (& depends on all of a{Xs : 0 < s < t}), and one is accustomed to use conditional independence at a stopping time, by the strong M a r k o v property. Nevertheless, one can use & to split the path {X~ : 0 < s < t} to date into the pre-& and post-pt fragments, and these are independent. The use of splitting times in this context, fluctuation theory of L6vy processes, is due to Greenwood and Pitman (1980a). Splitting times were introduced by Williams (1974) (Brownian motion), Jacobsen (1974), Millar (1977a, b) (Markov processes); for a textbook treatment, see Rogers and Williams (1994), III.49.
Queues and dams. The discrete-time framework of III.5 focusses on the individual customers. Suppose, however, that we focus on the server, and study his workload (amount of service-demand in the system), as a function of time which is continuous. It is now more natural to use a stochastic process formulation throughout. For the GI/G/1 queue above, the input process (the point process of customer arrivals) is a renewalprocess; for the most important special case - when the inter-arrival time distribution A is exponential - this renewal process is a Poisson process, which is Markovian (and is the only renewal process with the M a r k o v property). The queue is now called M / G / 1 , to emphasise the Markovian nature of the input stream. The cumulative service demand to date, U(t), is a compound Poisson process; if Xt := t - Ut, X = (Xt) is a spectrally negative L~vy process. I f Vt denotes the virtual waiting-time at time t, the time that a hypothetical customer arriving at time t would have to wait, or the service-load facing the server, then Vtt = Xt - Xt ;
see Takfics (1962) for this result, and for background. Thus, for instance, the server is idle when X has a ladder epoch. One can think of the pent-up service demand as being 'stored' in the queue, and this suggests that the queueing model above extends to other storage models, such as those of dams. This is indeed the case; see Bingham (1975) for details.
7. Barrier problems
Suppose we are interested in the time and place of first passage to or over a positive barrier x, starting at 0, then the first-passage time process z = (vx) may be analysed together with the m a x i m u m process M = (Mn), as these are pathwise inverse: Mn > x iff "cx _< n. The W i e n e r - H o p f factors needed above to handle Mn suffice also to handle zx. One is dealing here with r a n d o m walks on a half-line (Spitzer (1964), IV). Suppose, instead one starts with a r a n d o m walk at the origin, and runs it till it first exits from an interval I-y, x] (0 < x, y) containing the origin. Such two-barrier problems are harder: one has here a r a n d o m walk on an interval (Spitzer (1964), V).
204
N. H. Bingham
A detailed treatment of such first-passage problems for Markov processes is given by Kemperman (1961). Results of this type are relevant to sequential analysis in statistics, where one samples until the test statistic exits from an interval, accepting one of two hypotheses depending on which barrier exit is across. For background, see e.g., Shiryaev (1973), Ch. IV. Similar one- and two-barrier problems arise in continuous time for L6vy processes (for a Wiener-Hopf formulation of the one-barrier case, see e.g., Bingham (1975), Th. le). They have applications to queues and dams: queues with finite waiting-room, and finite dams, which may overflow as well as be empty. For such applied background, see e.g., Bingham (1975), Section 7 and the references there, particularly to the works of Takfics. For a very simple proof of the principal explicit result in this area, see Rogers (1990).
8. Higher dimensions and algebraic extensions

The classical results of Part I on random walk in one dimension generalise, for the most part, to higher dimensions. It is natural to ask whether this is true of the fluctuation theory of Part III. The situation here is clearly different and less positive: we deal here with the maximum partial sum, and the maximum is with respect to the total ordering on the real line R. For Rd, d > 2, no such total ordering exists, though partial orderings do. The half-line [0, ec) plays a key role in Spitzer's identity on N, via the positive parts S+ of the partial sums Sn, and the key property of the half-line relevant here is that of being a convex cone (closed under vector addition and multiplication by non-negative scalars). A close analogue of the Spitzer-Baxter identity, giving the joint distribution of the time and place of first exit of a random walk in b2 d from a convex cone, was given by Greenwood and Shaked (1977), who gave applications to queueing and storage systems in d dimensions (see also Mogulskii and Pecherskii (1977)). This joint law is given in terms of what one calls a Wiener-Hopf factor for the cone, by analogy with the one-dimensional case. Now in d dimensions for d _> 2, the number of convex cones needed to fill out the whole of Nd is greater than two, except for the case of two cones which are complementary half-spaces. This situation is really one-dimensional, on projecting onto the normal through the origin to the hyperplane bounding the complementary halfspaces. Thus a genuinely higher-dimensional fluctuation theory requires at least three Wiener-Hopf factors (Kingman (1962a) observed that a two-factor theory must be essentially one-dimensional, as above; the multi-factor theory was later developed by Greenwood and Shaked (1977)). The Sparre Andersen equivalence principle, however, does not extend from one to higher dimensions. For details, see e.g., Hobby and Pyke (1963a), (1963b), (1963c), Pyke (1973), Section 4.3.2. The algebra of queues. One of the shortest proofs of Spitzer's identity is that of Wendel (1958). Kingman (1966), Section 2, 3 gave a systematic treatment of the algebraic structure of this and related results, isolating the concept of a Wendel
205
projection. Kingman (1966) also discusses the Wiener-Hopf technique in this connection (Section 6), combinatorial aspects (Section 7: these go back to Spitzer (1956)), and the concept of a Baxter algebra (Section 13). Kingman's algebraic approach is primarily motivated by the theory of the single-server queue above; the case of a many-server queue, which as we saw in III.5 is much harder, is discussed briefly in Kingman (1966), Section 12. Combinatorics on words. The algebraic and combinatorial aspects of Spitzer's identity and related results have also been studied in connection with the subject of combinatorics on words. The connection is due to Foata and Schiitzenberger (1971), and has been developed in the book Lothaire (1983) (Lothaire is the penname of a group of mathematicians including Schiitzenberger and co-workers), See in particular Chapter 5 there by Perrin (the combinatorial content of Spitzer's identity is Th. 5.4.3, and Sparre Andersen's equivalence principle is Prop. 5.2.9), and Chapter 10 by Foata.
9. Distribution-free results and non-parametric statistics
Empiricals. Recall the classical setting of the Kolmogorov-Smirnov test of nonparametric statistics. We draw a random sample of size n from a population distribution F; we use the n readings X~ to form the empirical distribution function
F.(x) := n k=l
i(x
_< n) ;
thus Fn has a jump of size 1In at each order statistic, the readings arranged in increasing order. By the Glivenko-Cantelli theorem (or Fundamental Theorem of Statistics), Fn converges to F uniformly on the whole line, with probability one. Thus if D. := sup{lF.(x) - F ( x ) I : x E R} is the discrepancy between Fn and F, D,--+0 (n-~oc) a.s.
(the one-sided version D + := sup{F,(x) - F(x) : x c R} is also useful). To test the hypothesis that the (unknown) population distribution is some specified F, we need to know the distribution of D, under this hypothesis. Provided only that F is continuous (so the readings are all distinct, and the order statistics defined unambiguously), the distribution of D~ is the same for all F, and so has a distribution-free character (like that of the discrete arc-sine law of III.3). This enables one to use the above to construct a non-parametric test for the null hypothesis that the population distribution is F. For the distribution-free nature of D~, see e.g., Feller (1968), III.1 Example (c), Feller (1971), 1.12. For the limit distribution of v/~D,, see Feller (1971), X.6, or Billingsley (1968) (Billingsley uses weak convergence theory to derive this limit distribution in terms of the Brownian bridge).
206
N. H. Bingham
This test, the Kolmogorov-Smirnov test, is one of the corner-stones of nonparametric statistics; for background, see e.g., Shorack and Wellner (1986). One can define statistics Dn, D + of Kolmogorov-Smirnov type in N~, but for d _> 2 the distribution-free character is lost (Simpson, 1951). Nevertheless, the limit distributions of Dn, D + are known (Kiefer and Wolfowitz, 1958).
Greatest convex minorants. If X1,... ,X, are i.i.d., Sparre Andersen (1953/4), II used his results on fluctuation theory to show that the number of sides of the greatest convex minorant (GCM) of the graph {k, ~ =k 1 X J } In is the same as that of the number of cycles in a randomly chosen permutation on n objects, and thus, the G C M statistic is distribution-free. This distribution is also that of the number of (upper) records in (X1,... ,Xn) (Foster and Stuart, 1954), and is given by
Pr = IS~l/n.
(r = 1,
.,n) ,
where [S~[ is the modulus of the Stirling number of the first kind (the coefficient of z r in z(z + 1 ) . . . . . ( z + r - 1)). A survey of combinatorial results of such kinds and their statistical applications is given by Barton and Mallows (1965). In an improving population, records become more frequent, and so the G C M statistic is informative for tests f o r trend', testing a null hypothesis H0 against an alternative hypothesis//1, where
Ho :#1 . . . . . #n, Y l :#1 ~ "'" ~ #n ;
here #i is the mean ofXt (Brunk (1960), using Sparre Andersen's result). There is a similar test based on medians (Brunk (1964), using Spitzer's combinatorial lemma). For textbook accounts of such statistical inference under order restrictions, see Barlow et al. (1972), Robertson et al. (1988). Problems of this type are topical in environmental statistics and studies of global warming, for example.
10. Postscript
As we have seen, fluctuation theory in one dimension is remarkably well-developed and complete, as regards both theory and applications. By contrast, the situation in higher dimensions is much less complete, and our knowledge here remains fragmentary by comparison. Study of the higher-dimensional case has several motivations, of which we mention three here to close: mathematical interest, and the needs of non-parametric statistics in higher dimensions and the theory of queues with many servers.
References
Aldous, D. (1983). Random walk on finite groups and rapidly mixing Markov chains. S~minaire de Probabilit~s XVII, 243~97. Lecture Notes in Math. 986, Springer. Aldous, D. (1989). An introduction to covering problems for random walks on graphs. J. Theor. Probab. 2, 87-89.
207
Aldous, D. and P. Diaconis (1986). Shuffing cards and stopping times. American Math. Monthly 93, 333-348. Aldous, D. and J. A. Fill (1998 +). Reversible Markov chains and random walks on graphs, to appear. Ancona, A. (1990). Th+orie du potentiel sur les graphes et les vari+t~s. Ecole d'Etd de Probabilit& de Saint-Flour XVIII, 3-112, Lecture Notes in Math. 1427, Springer. Andersen, E. Sparre (1953/4). On the fluctuations of sums of random variables, I, II. Math. Scand. 1, 263-285, 2, 195-223. Arjas, E. and T. P. Speed (1973). Symmetric Wiener-Hopf factorizations in Markov additive processes. Z. Wahrschein. 26, 105-118. Asmussen, S. (1982). Conditional limit theorems relating a random walk to its associate, with applications to risk reserve processes and the GI/G/1 queue. Adv. Appl. Probab. 14, 143-170. Asmussen, S. (1987). Applied probability and queues. Wiley. Asmussen, S. (1989). Aspects of matrix Wiener-Hopf factorization in applied probability. Math. Scientist 14, 101-116. Baldi, P. (1981). Charact~risation des groupes de Lie connexes r~cnrrents. Ann. Inst. H. Poinear~, S&. Prob. Stat. 17, 281 308. Ball, F. G., B. Dunham and A. Hirschowitz (1997). On the mean and variance of cover times for random walks on graphs. J. Math. Anal. Appl. 207, 506-514. Barber, M. N. and B. W. Ninham (1970). Random and restricted random walks: Theory and applications. Gordon and Breach, New York. Barlow, R. E., D. J. Bartholomew, J. M. Bremner and H. D. Brunk (1972). Statistical inference under order restrictions. Wiley. Barton, D. E. and C. L. Mallows (1965). Some aspects of the random sequence. Ann. Math. Statist. 36, 236-260. Baxter, G. (1958). An operator identity. Pacific J. Math. 4, 649 663. Bayer, D. and P. Diaconis (1992). Trailing the dovetail shuffle to its lair. Ann. Appl. Probab. 2, 294-313. Beardon, A. F. (1983). The geometry of discrete groups. Graduate Texts in Math. 91, Springer. Belsley, E. D. (1998). Rates of convergence of random walk on distance-regular graphs. Probab. Theor. Rel. Fields 112, 493-533. Bertoin, J. (1996). Ldvy processes. Cambridge Tracts in Math. 121, Cambridge University Press. Bertoin, J. and R. A. Doney (1994). On conditioning a random walk to stay non-negative. Annals of Probability 22, 2152-2167. Biggins, J. D. (1995). The growth and spread of the general branching random walk. Ann. Appl. Probab, 5, 1008-1024. Biggins, J. D. (1998). Lindley-type equations in the branching random walk. Stochastic Processes and Applications 75, 105-133. Biggins, J. D. and N. H. Bingham (1991). Near-constancy phenomena for supercritical branching processes. Math. Proc. Cambridge Phil. Soc. 110, 545 558. Biggins, J. D. and N. H. Bingham (1993). Large deviations in the supercritical branching process. Adv. Appl. Probab. 25, 757-772. Biggs, N. L. (1974). Algebraic graph theory. Cambridge Tracts in Math. 67, Cambridge Univ. Press (2nd ed. 1993). Biggs, N. L. (1997). Algebraic potential theory on graphs. Bull. London Math. Soc. 29, 641 682, Billingsley, P. (1968). Convergence of probability measures. Wiley. Bingham, N. H. (1972). Random walk on spheres. Z. Wahrschein. verw. Geb. 22, 169-192. Bingham, N. H. (1973). Limit theorems for a class of Markov processes. In Stochastic Analysis (Rollo Davidson Memorial Volume (Eds., D. G. Kendall and E. F. Harding), pp. 266-293, Wiley. Bingham, N. H. (1975). Fluctuation theory in continuous time. Adv. Appl. Probab. 7, 705 766. Bingham, N. H. (1980). Wiene~Hopf and related methods in probability theory. In Aspects of Contemporaw Complex Analysis (Eds., D. A. Brannan and J. G. Clunie), pp. 369-375, Academic Press. Bingham, N. H. (1989). Tauberian theorems in probability theory. Probability" on Groups IX, 6-20. Lecture Notes in Math. 1379, Springer.
208
N. H. Bingham
Bingham, N. H. (1991). Fluctuation theory for the Ehrenfest urn. Adv. Appl. Probab. 23, 598 611. Bingham, N. H. (1998). Fluctuations. Math. Scientist 23, 63-73. Bingham, N. H., C. M. Goldie and J. L. Teugels (1987). Regular variation. Encycl. Math. Appl. 27, Cambridge Univ. Press. Bingham, N. H. and R. Kiesel (1998). Risk-neutral valuation. Pricing and hedging of financial derivatives, Springer, London. Blackwell, D. (1953). Extension of a renewal theorem. Pacific J. Math. 3, 315-320. Bloom, W. and H. Heyer (1994). Harmonic analysis of probability measures on hypergroups. De Gruyter Studies in Math. 20, Walter de Gruyter. Bondar, J. and P. Milnes (1981). Amenability: a survey for statistical applications of Hunt-Stein and related conditions on groups. Z. Wahrschein, 57, 103-128. Broder, A. and A. R. Karlin (1989). Bounds on the cover time. J. Theor. Probab. 2, 101-120. Brunk, H. D. (1960). A theorem of E. Sparre Andersen and its application to tests against trend. Math. Scand. 8, 305-326. Brunk, H. D. (1964). A generalization of Spitzer's combinatorial lemma. Z. Wahrschein. verw. Geb. 2, 395405. Chavel, I. (1993). Riemannian geometry: A modern introduction. Cambridge Tracts in Math. 108, Cambridge Univ. Press. Chung, K.-L. (1952). On the renewal theorem in higher dimensions. Skand. Akt. 35, 188-194. Chung, K.-L. (1967). Markov chains with stationary transition probabilities. Grundlehren math. Wiss. 104, Springer. Chung, K.-L. (1968). A course in probability theory. Harcourt, Brace and Jovanovich Inc. (2nd ed. 1974, Academic Press). Chung, K.-L. (1995). Green, Brown and probability. World Scientific. Chung, K.-L. andW. Feller (1949). On fluctuationsin coin-tossing. Proc. Nat. Acad. Sci. USA 35,605-608. Chung, K.-L. and W. H. J. Fuehs (1951). On the distribution of values of sums of random variables. Memoirs Amer. Math. Soc. 6, 12p. Cohen, J. W. (1992). Analysis of random walks. Studies in Probability, Optimisation and Statistics 2, IOS Press, Amsterdam. Cox, J. C., S. A. Ross and M. Rubinstein (i979). Option pricing: a simplified approach. J. Financial Economics 7, 229 263. Coxeter, H. S. M. (1963). Regular polytopes, Macmillan (3rd ed. 1973, Dover). Davis, B. (1989). Loss of recurrence in reinforced random walk. In Almost Everywhere Convergence (Eds., G. A. Edgar and L. Sucheston), pp. 179-188, Academic Press. Davies, E. B. (1989). Heat kernels and spectral theory. Cambridge Tracts in Math. 92, Cambridge Univ. Press. Davies, E. B. (1993). Large deviations for heat kernels on graphs. J. London Math. Soc. (2) 47, 65-72. Dembo, A. and O. Zeitouni (1993). Large deviations techniques and applications. Jones and Bartlett, London and Boston. Derriennic, Y. and Y. Guivarc'h (1973). Th6or~me de renouvellement pour les groupes non moyennables. Comptes Rendues Hebd. Acad. Sci. 277A, 613-615. Deuschel, J.-D. and D. Stroock (1989). Large deviations. Academic Press. Devroye, L. and A. Sbihi (1990). Random walks on highly symmetric graphs. J. Theor. Probab. 4, 497o514. Diaconis, P. (1988). Group representations in probability and statistics. IMS Lecture Notes 11, Inst. Math. Statist., Hayward, CA. Diaconis, P., M. McGrath and J. W. Pitman (1995). Riffle shuffles, cycles and descents. Combinatorica 15, l 1-29. Diaconis, P. and D. Stroock (1991). Geometric bounds for eigenvalues of Markov chains. Ann. Appl. Probab. 1, 36 61. Dodziuk, J. (1984). Difference equations, isoperimetric inequality and transience of certain random walks. Trans. Amer. Math. Soe. 284, 787-794.
209
Dodziuk, J. and W. S. Kendall (1986). Combinatorial Laplacians amd isoperimetric inequality. In From local times to global geometry, control and physics (Eds., K. D. Elworthy), pp. 68-74. Research Notes in Math. 150, Pitman. Doeblin, W. (1938). Expos+ de la th6orie des cha~nes simples constantes de Markov fi un nombre fini d'~tats. Rev. Math. Union Interbalkanique 2, 72105. Doney, R. A. (1995). Spitzer's condition and ladder variables in random walks. Probab. Th. Rel. Fields 101, 577-580. Doyle, P. G. and J. L. Snell (1984). Random walks and electric networks. Carus Math. Monographs 22, Math, Assoc. America, Washington DC. Dwass, M. and S. Karlin (t963). Conditioned limit theorems. Ann. Math. Statist. 34, 1147-1167. Dynkin, E. B. (1961). Limit theorems for smns of independent random variables with infinite expectation. Selected Translations ( I M S - A M S ) 1, 171-189. Elie, L. (1982a). Comportement asymptotique du noyau potentiel sur les groups de Lie. Ann. Sci. [~c. Norm. Sup. (4) 15, 257-364. Elie, L. (1982b). Sur le th+or~me de dichotomie pour une marche al6atoire sur un espace homog6ne. Probability on Groups VI, 60-75. Lecture Notes in Math. 928, Springer. Eymard, P. (1972). Moyennes invariantes et reprdsentations unitaires. Lecture Notes in Math. 300, Springer. Falconer, K. J. (1985). The geometry offraetal sets. Cambridge Tracts in Math. 85, Cambridge Univ. Press. Feller, W. (1949). Fluctuation theory of recurrent events. Trans. Amer. Math. Soc. 67, 98-119. Feller, W. (1968). An introduction to probability theory and its applications, Volume 1, 3rd ed., Wiley (lst ed. 1950, 2nd ed. 1957). Feller, W. (1971). An introduction to probability theory and its applications, Volume 2, 2nd ed. (lst ed. 1966). Fill, J. A. (1991). Eigenvalue bounds on convergence to stationarity for nonreversible Markov chains, with an application to the exclusion process. Ann. Appl. Probab. 1, 6~87. Foata, D. and M.-P. Schfitzenberger (1971). On the principle of equivalence of Sparre Andersen. Math. Scand. 28, 308-316. Foster, F. G. and A. Stuart (1954). Distribution-free tests in time series based on the breaking of records. J. Roy. Statist. Soc. B 16, 1-22. Furstenberg, H. (1971). Random walks and discrete subgroups of Lie groups. In Advances in Probability and Related Topics 1 (Ed., P. E. Ney), pp. 1-63. Gerl, P. (1988). Random walks on graphs with a strong isoperimetric inequality. J. Theoretical Probability 1, 171 187. Gerl, P. and W. Woess (1986). Simple random walks on trees. European J. Combinatories 7, 321-331. Guivarc'h, Y., M. Keane and B. Roynette (1977). Marches aldatoires sur les groupes de Lie. Lecture Notes in Math. 624, Springer. Grabner, P. J. (1997). Functional iteration and stopping times for Brownian motion on the Sierpinski gasket. Mathematika 44, 374400. Grabner, P. J. and W. Woess (1997). Functional iterations and periodic oscillations for simple random walk on the Sierpinski graph. Stoch. Proe. Appl. 69, 127-138. Greenleaf, F. P. (1969). Invariant means on topological groups and their applications. Van Nostrand Math. Studies 19, Van Nostrand, New York. Greenwood, P. E. (1975). Wiener-Hopf methods, decompositions and factorisation identities for maxima and minima of homogeneous random processes. Adv. Appl. Probab. 7, 767-785. Greenwood, P. E. (1976). Wiener Hopf decomposition of random walks and L6vy processes. Z. Wahrschein verw. Geb. 34, 193-t98. Greenwood, P. E. and J. W. Pitman (1980a). Fluctuation theory for L~vy processes and splitting at the maximum. Adv. Appl. Probab. 12, 893-902. Greenwood, P. E. and J. W. Pitman (1980b). Construction of local time and Poisson point processes from nested arrays. J. London Math. Soc. (2) 22, 182-192.
210
N. H. Bingham
Greenwood, P. E. and M. Shaked (1977). Fluctuations of random walk in a and storage systems. Adv. Appl. Probab. 9, 566-587. Grimmett, G. R. and D. R. Stirzaker (1992). Probability and random processes, 2nd ed., Oxford Univ. Press (lst ed. 1982). Hattori, K. and T. Hattori (1991). Self-avoiding process on the Sierpinski gasket. Prob. Th. Rel. Fields 88, 405-428. Hattori, K., T. Hattori and S. Kusuoka (1990). Self-avoiding paths on the pre-Sierpinski gasket. Prob. Th. Rel. Fields 84, 1-26. Helgason, S. (1962). Differential geometry and symmetric spaces. Academic Press. Heyer, H. (1977). Probability measures on locally compact groups. Ergebnisse Math. 94, Springer. Heyer, H. (1983). Convolution semigroups of probability measures on Gelfand pairs. Expositiones Math. 1, 3-45. Heyde, C. C. (1969). On extremal factorisation and recurrent events. J. Roy. Statist. Soc. B 31, 72-79. Hobby, C. R. and R. Pyke (1963a). Combinatorial results in multi-dimensional fluctuation theory. Ann. Math. Statist. 34, 402-404. Hobby, C. R. and R. Pyke (1963b). Combinatorial results in fluctuation theory. Ann. Math. Statist. 34, 1233-1242. Hobby, C. R. and R. Pyke (1963c). Remarks on the equivalence principle in fluctuation theory. Math. Stand. 12, 19-24. H6gnfis, G. (1974). Marches altatoires sur un demi-groupe compact. Ann. Inst. H. Poincarg B 10, 115154. Hughes, B. D. (1995). Random walks and random environments. Volume 1: Random walks. Oxford Univ. Press. Jacobsen, M. (1974). Splitting times for Markov processes and a generalised Markov property for diffusions. Z. Wahrschein. verw. Geb. 30, 27-43. Jones, O. D. (1996). Transition probabilities for the simple random walk on the Sierpinski gasket. Stoch. Proc. Appl. 61, 45-69. Kac, M. (1959). Probability and related topics #I the physical sciences. Interscience. Kahn, J. D., N. Linial, N. Nisan and M. E. Saks (1989). On the cover time of random walks on graphs. J. Th. Probab. 2, 121 128. Kakutani, S. (1944). Two-dimensional Brownian motion and harmonic functions. Proc. Imp. Acad. Tokyo 20, 706-714. Kaimanovich, V. A. (1991). Poisson boundaries of random walks on discrete solvable groups. In Probability on Groups X (Ed., H. Heyer), pp. 39-43, Plenum Press. Kaimanovich, V. A. and A. Vershik (1983). Random walk on discrete groups: Boundary and entropy. Ann. Probab. 1i, 457-490. Karpelevich, F. I., M. Ya. Kelbert and Yu. M. Suhov (1994). Higher-order Lindley equations. Stoch. Proc. Appl. 53, 65-96. Kelly, F. P. (1979). Reversibility and stochastic networks. Wiley. Kemperman, J. H. B. (1961). The passage problem for a stationary Markov chain. Univ. Chicago Press. Kendall, D. G. (1951). Some problems in the theory of queues. J. Roy. Statist. Soc. B 13, 151-185. Kennedy, J. E. (1998). On duality and the Spitzer-Pollaczek factorization for random walks. Stoch. Proc. Appl. 76, 251566. Kesten, H. (1959). Symmetric random walks on groups. Trans. Amer. Math. Soc. 92, 336-354. Kesten, H. (1967). The Martin boundary of recurrent random walks on countable groups. Proc. Fifth Berkeley Symp. Math. Statist. Probab, Volume II, 51-74, Univ. California Press. Kesten, H. and F. L. Spitzer (1965). Random walks on countably infinite Abelian groups. Acta Math. 114, 237-265. Kiefer, J. and J. Wolfowitz (1955). On the theory of queues with many servers. Trans. Amer. Math. Soc. 78, 1 18. Kiefer, J. and J. Wolfowitz (1956). On the characteristics of the general queueing process, with applications to random walks. Ann. Math. Statist. 27, 147-161.
211
Kiefer, J. and J. Wolfowitz (1958). On the deviations of the empirical distribution of vector chance variables. Trans. Amer. Math. Soc. 87, 173-186. Kingman, J. F. C. (1962a). Spitzer's identity and its use in probability theory. J. London Math. Soc. 37, 309-316. Kingman, J. F. C. (1962b). The use of Spitzer's identity in the investigation of the busy period and other quantities in the queue GI/G/1. J. Australian Math. Soc. 2, 345-356. Kingman, J. F. C. (1966). On the algebra of queues. J. Applied Probability 3, 285 326 (reprinted as Volume 6 of Methuen's Review Series in Applied Probability). Lamperti, J. (1962). An invariance principle in renewal theory. Ann. Math. Statist. 33, 685 696. Letac, G. (1981). Probl6mes classiques de probabilit6 sur un couple de Gelfand. Analytic Methods in Probability, 93 120. Lecture Notes' in Math. 861, Springer. Letac, G. (1982). Les fonctions sph6riques d'un couple de Gelfand symm6trique et les cha~nes de Markov. Adv. Appl. Probab. 14, 272-294. Letac, G. and L. Takfics (1980a). Random walk on a dodecahedron. J. Appl. Probab. 17, 373-384. Letac, G. and L. Tak~cs (1980b). Random walk on a 600-cell. S I A M 3". Alg. Discrete Methods 1, 114-120. Lindley, D. V. (1952). The theory of queues with a single server. Proc. Cambridge Math. Soc. 48, 277-289. Lindvall, T. (1992). Lectures on the coupling method. Wiley. Lothaire, M. (1983). Combinatorics on words. Encycl. Math. Appl. 17, Addison-Wesley. Lovfisz, L. (1986). Random walks on graphs: a survey. Combinatorics. Paul Erd6s is eighty. Volume 2, 353-397. Bolyai Soc. Math. Studies 2, Janos Bolyai Math. Soc., Budapest. Lovfisz, L. and P. M. Winkler (1995). Mixing of random walks and other diffusions on a graph. In Surveys in Combinatorics (Ed., P. Rowlinson), pp. 119-154, London Math. Soc. Lecture Notes Series 218, Cambridge Univ. Press. Lyons, R. (1990). Random walk and percolation on trees. Ann. Probab. 18, 931 958. Lyons, R. (t992). Random walks, capacity and percolation on trees. Ann. Probab. 20, 20433088. Lyons, R. and R. Pemantle (1992). Random walk in a random environment and first-passage percolation on trees. Ann. Probab. 20, 125-136. Lyons, R. and Y. Peres (1998 +). Probability on trees, book in preparation. Lyons, T. J. (1983). A simple criterion for transience of a reversible Markov chain. Ann. Probab. 11, 393M02. Lyons, T. J. and D. Sullivan (1984). Function theory, random paths and covering spaces. J. Differential Geometry 19, 299-323. Madras, N. and G. Slade (1993). The self-avoiding walk. Birkhguser, Boston. Millar, P. W. (1977a). Zero-one laws and the minimum o f a Markov process. Trans. Amer. Math. Soc. 226, 365-391. Millar, P. W. (1977b). Random times and decomposition theorems. Proc. Syrup. Pure Math. 31, 91-103. Mogulskii, A. A. and E. A. Pecherskii (1977). On the first exit time from a semigroup in Nm for a random walk. Th. Probab. Appl. 22, 818 825. Motwani, R. and P. Raghavan (1995). Randomised algorithms. Cambridge Univ. Press. Mukherjee, A. and N. A. Tserpes (1976). Measures on topological semigroups. Convolution products and random walks. Lecture Notes in Math 547, Springer. Nagaev, A. V. (1979). Renewal theorems in Rd. Th. Probab. Appl. 24, 572 581. Nash-Williams, C. St. J. A. (1959). Random walks and electrical currents in networks. Proc. Cambridge Phil. Soc. 55, 181-194. Norris, J. R. (1997). Markov chains. Cambridge Univ. Press. Ornstein, D. (1969). Random walks, I, II. Trans. Amer. Math. Soc. 138, 143, 45 60. Palacios, J. L. (I993). Fluctuation theory for the Ehrenfest urn via electrical networks. Adv. Appl. Probab. 25, 472476. Palacios, J. L. (1994). Another look at the Ehrenfest urn via electric networks. Adv. Appl. Probab. 26, 820-824.
212
N. H. Bingham
Palacios, J. L. and J. M. Renum (1998). Random walks on edge-transitive graphs. Statist. Probab. Letters 37, 29 34. Paley, R. E. A. C. and N. Wiener (1934). Fourier transforms in the complex domain. AMS Colloquium Publications 19, Amer. Math. Soc., Providence, RI. Pang, M. M. H. (1993). Heat kernels on graphs. J. London Math. Soc. (2) 47, 5~64. Pemantle, R. (1988). Phase transition in reinforced random walk. Ann. Probab. 16, 1229-1241. Pemantle, R. and Y. Peres (1995). Critical random walk in random environment on trees. Ann. Probab. 23, 105-140. P61ya, G. (1921). Uber eine Aufgabe der Wahrscheinlichkeitsrechnung betreffend die Irrfahrt im Strassennetz. Math. Annalen 84, 149-160. Port, S. C. (1963). An elementary approach to fluctuation theory. J. Math. Anal. Appl. 6, 109-151. Port, S. C. and C. J. Stone (1969). Potential theory of random walks on abelian groups. Acta Math. 122, 19 114. Port, S. C. and C. J. Stone (1978). Brownian motion and classicalpotential theory. Academic Press, New York. Pyke, R. (1973). Partial sums of matrix arrays, and Brownian sheets. In Stochastic Analysis (Rollo Davidson Memorial Volume (Eds., D. G. Kendall and E. F. Harding), pp. 349-363, Wiley. R6v+sz, P. (1990). Random walks in random and non-random environments. World Scientific. Revuz, D. (1984). Markov chains. North-Holland, Amsterdam. Robertson, T., F. T. Wright and R. L. Dijkstra (1988). Order-restricted statistical inference, Wiley. Rogers, L. C. G. (1985). Recurrence of additive functionals of Markov chains. Sankhy8 47, 47-56. Rogers, L. C. G. (1990). The two-sided exit problem for spectrally positive L6vy processes. Adv. Appl. Probab. 22, 486M87. Rogers, L. C. G. and D. Williams (1994). Diffusions, Markov processes and martingales. Volume One. Foundations, 2nd ed., Wiley (lst ed., Williams, D., 1979). Rudin, W. (1962). Fourier analysis on groups. Wiley/Interscience. Sawyer, S. A. (1997). Martin boundaries and random walks. Contemp. Math. 206, 17M4. Schott, R. (1984). Random walks on homogeneous spaces. Probability on Groups VII, 564-575. Lecture Notes in Math. 1064, Springer. Schwarzenberger, R. (1974). The 17 plane symmetry groups. Math. Gazette 58, 123-131. Schwarzenberger, R. (1980). N-dimensional crystallography. Research Notes in Math. 41, Pitman. Series, C. (1983). Martin boundaries of random walks on Fuchsian groups. Israel J. Math. 44, 221-242. Shiryaev, A. N. (1973). Sequential statistical analysis. Optimal stopping rules. Transl. Math. Monographs 38, Amer. Math. Soc., Providence, RI (2rid ed. Optimal stopping rules, 1978, Springer). Shorack, G. R. and J. A. Wellner (1986). Empiricalprocesses with applications to statistics. Wiley, New York. Simpson, P. B. (1951). Note on the estimation of a bivariate distribution function. Ann. Math. Statist. 22, 476M78. Sinai, Ya. G. (1957). On the distribution of the first positive sum for a sequence of independent random variables. Th. Probab. Appl. 2, 12~129. Sinclair, A. J. (1993). Algorithms for random number generation and counting. Birkh~user, Basel. Smit, J. H. A. de (1973a). A simple analytic proof of the Pollaczek-Wendel identity for partial sums of order statistics. Ann. Probab. 1, 348-351. Smit, J. H. A. de (1973b). Some general results for many-server queues. Adv. Appl. Probab. 5, 153-169. Smit, J. H. A. de (1973c). On many-server queues with exponential service-times. Adv. Appl. Probab. 5, 170-182. Soardi, P. M. (1990). Recurrence and transience of the edge graph of a tiling in the Euclidean plane. Math. Annalen 287, 613 626. Spitzer, F. L. (1956). A combinatorial lemma and its application to probability theory. Trans. Amer. Math. Soc. 82, 323-343.
213
Spitzer, F. L. (1957). The Wiener-Hopf equation whose kernel is a probability density. Duke Math. J. 24, 327-343. Spitzer, F. L. (1960). A Tauberian theorem and its probability interpretation. Trans. Amer. Math. Soe. 94, 150-169. Spitzer, F. L. (1964). Principles of random walk. Van Nostrand, Princeton, NJ (2nd ed., Graduate Texts in Math. 34, 1976, Springer). Sunyach, C. (1981). Capacit~ et th~orie de renouvellement. Bull. Soc. Math. France 109, 283-296. Takfics, L. (1962). Introduction to the theory of queues. Oxford Univ. Press, New York. Tak~.cs, L. (1967). Combinatorial methods in the theory of stochastic processes. Wiley. Takfics, L. (1981). Random flights on regular polytopes. S I A M J. Alg. Discrete Methods 2, 153-171. Tak~cs, L. (1982). Random walks on groups. Linear Alg. Appl. 43, 49-67. Takfics, L. (1983). Random walk on a finite group. Acta Sci. Math. (Szeged) 45, 395-408, Tak~cs, L. (1984). Random flights on regular graphs. Adv. Appl. Probab. 16, 618-637. Tak~cs, L. (1986). Harmonic analysis on Schur algebras and its applications in the theory of probability. In Probability Theory in Harmonic Analysis (Eds., J.-A. Chao and W. A. Woyczyfiski), pp. 227-283, Marcel Dekker, New York. Takacs, C. (1998). Biased random walks on directed trees. Probab. Th. Rel. Fields 111, 123 129. Telcs, A. (1989). Random walks on graphs, electrical networks and fractals. Probab. Th. Rel. Fields 82, 435-449. Tetali, P. (1991). Random walks and the effective resistance of networks. J. Theor. Probab. 4, 101 109. Tetali, P. (1994). An extension of Foster's network theorem. Combin. Probab. Comput. 3, 421-427. Todhunter, I. (1949). A history of the mathematical theory of probability from the time of Pascal to that of Laplace. Chelsea, New York (first published 1865). Urban, R. (1997). Some remarks on the random walk on finite groups. Colloq. Math. 74, 287-298. Varopoulos, N. Th. (1983a). Random walks on soluble groups. Bull. Sci. Math. (2) 107, 337 344. Varopoulos, N. Th. (1983b). Brownian motion and transient groups. Ann. Inst. Fourier 33, 241-261. Varopoulos, N. Th. (1984). Brownian motion and random walks in manifolds. Symp. Math. XXIX, 97-109. Varopoulos, N. Th., L. Saloff-Coste and T. Coulhon (1992). Analysis and geometry on groups. Cambridge Tracts in Math. 100, Cambridge Univ. Press. Vilenkin, N. Ya. (1968). Special functions and the theory of group representations. Transl. Math. Monographs 22, Amer. Math. Soc., Providence, RI. Wendel, J. G. (1958). Spitzer's formula: A short proof. Proc. Amer. Math. Soc. 9, 905 908. Wendel, J. G. (1960). Order statistics of partial sums. Ann. Math. Statist. 31, 1034-1044. Wendel, J. G. (1962). Brief proof of a theorem of Baxter. Math. Scand. 11, 107-108. Wendel, J. G. (1975). Left-continuous random walk and the Lagrange expansion. Amer. Math. Monthly 82, 494-499. Whitworth, W. A. (1886). Choice and chance, 4th ed., Deighton Bell, Cambridge (5th ed. 1910; reprinted G. E. Stechert, New York, 1934). Wiener, N. and E. Hopf (1931). lJber eine Klasse singul~ire Integralgleichungen. Sitzungsberichte Berliner Akad. Wiss., 696-706. Williams, D. (1974). Path decomposition and continuity of local time for one-dimensional diffusions. Proc. London Math. Soc. (3) 28, 738 768. Woess, W. (1991). Behaviour at infinity of harmonic functions of random walks on graphs. In Probability on Groups X (Ed., H. Heyer), 437-458, Plenum Press, New York and London. Woess, W. (1994). Random walks on infinite graphs and groups - a survey on selected topics. Bull. London Math. Soc. 26, 1-60. Zemanian, A. H. (1992). Infinite electrical networks. Cambridge Tracts in Math. 101, Cambridge Univ. Press. Zolotarev, V. M. (1957). Melli~Stieltjes transform in probability theory. Th. Probab. Appl. 2, 433-460. Zuckerman, D. (1989). Covering times of random walks on bounded degree trees and other graphs. J. Theor. Probab. 2, 147 157.
D. N. Shanbhag and C. R. Rao, eds., Handbookof Statistics, Vol. 19 2001 Elsevier ScienceB.V. All rights reserved.
t)
A Semigroup Representation and Asymptotic Behavior of Certain Statistics of the Fisher-Wright-Moran Coalescent
A d a m Bobrowski, M a r e k K i m m e l , Ovide Arino a n d Ranajit C h a k r a b o r t y
We derive new results giving mathematical properties of functions of allele frequencies under the time-continuous Fisher Wright-Moran model with mutations of the general Markov-chain form. The matrix R(t) (possibly infinite) of the joint distributions of the types of a pair of alleles sampled from the population at time t, satisfies a matrix differential equation of the form dR(t)/dt = [Q*R(t) + R(t)Q 1 [1/(2N)]R(t) + [1/(2N)]H(t), where Q is the intensity matrix of the Markov chain, H(t) is its diagonalized probability distribution, and N is the effective population size. This is the Lyapunov differential equation, known in control theory. Investigation of behavior of its solutions leads to consideration of tensor products of transition (Markov) semigroups. Semigroup theory methods allow proofs of asymptotic results for the model, also in the cases when the population size does not stay constant. If population is composed of a number of disjoint subpopulations, the asymptotics depend on the growth rate of the population. Special cases of the model include stepwise mutation models with and without allele size constraints, and with directional bias of mutations. Allele state changes caused by recombinatorial misalignment and more complex sequence conversion patterns also can be incorporated in this model. The methodology developed can also be applied to model coevolution of disease and marker loci, of further use for linkage disequilibrium mapping of disease genes.
1. Introduction
The purpose of this paper is to introduce a unified mathematical treatment of a family of population genetics models, based on the coalescent (Griffiths and Tavar6, 1994; Tavar6, 1984, 1995), focused on distributions of alleles on pairs of chromosomes sampled from a population. The coalescent is a mathematical tool which makes possible modeling the effects of the so-called genetic drift. Genetic drift is the process in which diversity is lost from a population, since each gene existing in the 215
216
A. Bobrowski, M. Kimmel, O. Arino and R. Chakraborty
population at a given time may be lost from it by not being passed to the progeny of currently extant individuals. Viewed backwards, this process is equivalent to each pair of individuals sharing a common ancestor. Therefore, eventually, all individuals share a common ancestor and, in the absence of mutations and recombinations, are identical by descent. For this reason, coalescence is frequently represented in the form of a binary ancestral tree of individuals, with nodes representing individuals and their common ancestors, and lengths of branches representing times separating individuals from their common ancestors. Various models of mutation can be superimposed on the coalescent. The mutation events are represented by epochs of point processes along the branches of the coalescent. Mathematical rules of coalescence are known for a wide class of population genetics models. However, they become complicated when the ancestry of more than two individuals is followed (Griffiths and Tavar6, 1994). Examples of the ancestral trees for three or four individuals may be found in Kimmel and Chakraborty (1996) or Pritchard and Feldman (1996). On the other hand, information concerning the ancestries of just two individuals is sufficient to derive a number of useful characteristics of a population, such as homozygosity and number of segregating sites between pairs of DNA sequences assuming the Infinite Sites Model (Li, 1997), or the within-population variance of allele sizes assuming the Generalized Stepwise Mutation Model (GSMM, Kimmel and Chakraborty, 1996). In this paper, we limit ourselves to models including joint distributions of allele types for pairs of individuals drawn from a sample. Our purpose is to find a mathematical description for a process that involves genetic drift with mutations having the form of a continuous-time Markov chain with a denumerable state space. Moreover, we admit variable population size. Based on the coalescent, we derive an infinite matrix differential equation known as the Lyapunov equation (Gajic and Qureshi, 1995), for the joint distribution of allele states of two individuals randomly drawn from the population. An equivalent derivation, resulting in a finite-dimensional equivalent of this equation, was carried out by O'Brien (1982, 1985), using the diffusion approximation of the time-discrete Wright-Fisher Model. As explained in the Discussion section, there is little overlap between O'Brien's analysis and ours. We proceed as follows: First, we characterize the semigroup of operators describing the joint mutation process of the two individuals. Then, using the approach of the theory of semigroups of linear operators (see Hille and Phillips, 1957), we prove theorems concerning the asymptotic behavior of the distributions under different patterns of population size change: growing, stable and decaying populations, if the mutation process is asymptotically stable. Among other, we characterize conditions for population growth and structure which imply the so-called "star-shaped coalescent" (see Kimmel and Chakraborty, 1996 and references therein). Subsequently, we consider important applications of the theory. One of them is the asymptotic behavior of the joint distributions under the GSMM, i.e., the Stepwise Mutation Model with a general distribution of allele-size changes by mutation. Again, we prove theorems concerning the asymptotic behavior of the
A semigroup representationand asymptotic behavior
217
distributions under different patterns of population size change: growing, stable and decaying populations. These theorems generalize earlier results by Roe (1992), Slatkin (1995) and Kimmel and Chakraborty (1996). We apply the methodology to the GSMM with allele-size constraints and calculate the resulting mutation-drift equilibrium, different from that under the unconstrained model. Also, we consider joint evolution of a gene coding for a rare genetic disease recombining with a marker allele, with the purpose of estimating the recombination fraction between these two loci. From the mathematical point of view, the paper provides a link between contraction semigroups, differential equations and population genetics. It is once more shown that the theory of semigroups of linear operators provides insight into problems of practical importance. 2. Markov-chain mutations and genetic drift in populations of varying size 2.1. Fisher-Wright-Moran coalescent model The time-continuous Moran model (Ewens, 1979) assumes the population is composed of a constant number of 2N haploid individuals. Each individual undergoes death/birth events according to a Poisson process with intensity 1 (mean length of life of each individual is equal to 1). Upon a death/birth event, a genotype for the individual is sampled with replacement from the 2N chromosomes present at this moment, including the chromosome of the just-deceased individual. The following is the coalescent formulation (Kingman, 1982b) of the FisherWright-Moran model for a pair of haploid individuals (chromosomes) from a population of 2N haploid individuals under genetic drift and mutations following a general time-continuous Markov chain: Coalescent with independent branch lengths with exponential distribution with parameter 1/(2N). The interpretation is that for any two individuals from the population, the time to their common ancestor is a random variable z exponential [1/(2N)]. (Tavar~, 1984, 1995). Markov model of mutations with transition probabilities Pij(t) and intensities Qij. The interpretation is that if the allele state of an individual is i at time 0, then his/her allele state at time t (or the allele state of his/her descendant at time t) is equal to j with probability P/j(t). Q is the transition intensity matrix satisfying the following conditions: (a) Qij >_ O, i 7~ j, (b) ~ j Qij = 0, all i. In the finitedimensional case (more generally, if Q defines a bounded linear operator, Section 3.1), the transition matrix satisfies P(t) = exp(Qt). Time t will be measured forward, unless stated otherwise. This is not always consistent with the convention used in the literature, but useful for time varying populations. We will use the coalescent model of genetic drift, modified to allow for the varying population size, i.e., N = N(t), which will be represented by timedependent hazard rate of the time to coalescence:
218
The time Ta (measured backward) to the common ancestor of two individuals from the sample taken at time t is a random variable with hazard rate [2N(t - ~)] l, i.e., Pr[Ta > ~] = e x p { - f~[2N(t - u)l-ldu}. The model of mutation stays unchanged, i.e. it has the form of the timecontinuous Markov chain with transition probabilities Pij(t) and intensities Qij. Let Rjk(t) = Pr[X1 =j, X2 = k], where X 1 and X2 are randomly selected chromosomes. If the common ancestor of X1 and X2 was of allele type i and it existed r units of time ago, then Rjk(t) = P/j(v)~k(z). The allele type of the common ancestor is the state of the Markov chain associated with the mutation process and so it is equal to i with probability zi(t) = Pr[X1 (t) = i] defined by this process. Taking this into account, we obtain,
Rjk(t)=
/0 Zrci(t-r)Pij(r)Pik(z) (1 2N(~_r)e-fo~
i
dr.
(1)
In matrix notation, following a change of variables
t -
r,
R(t) =
oo
P*(t- a)H(a)P(t- a)
e j~2N(,--~da ,
(2)
17(t) = diag[~i(t)] and superscript * denotes matrix transpose. Note that ~jkRjk(t) = a-exp{-fo[ZX(t-u)] Idu}, so the distribution R(t) may be improper if fo[2N(t - u)l-ldu < oe. This would mean that XI and X2 may not
where have a common ancestor. Also, the above formulation requires that the Markov chain be extendable indefinitely into the past, i.e., that ;/(a) exist for all a _< t. Not getting into conditions that might ensure it, let us carry out a formal transformation of (2), by splitting the integration interval into two parts
R(t)= (fo+
~ot)p*(t-a)H(a)P(t-a)(2N~e-f~)da f~)da]P(t)e f ~
=P*(t)[/P*(-a)H(a)P(-a)(2N~e +
/o P*(t- ~)II(a)P(t- a)
t
1 e-2~("~ da
= P* (t)a(o)P(t)e-.~
1 _ t du
+ 9o P*(t-a)Fl(a)P(t-a)(2N~)(a) e
f~/~/)do- .
(3)
This latter expression could be derived by assuming that if the coalescent time is longer than t, the two individuals do not coalesce, but that their allele statuses
A semigroup representation and asymptotic behavior
219
have joint distribution R(0) and marginal distributions re(0). Let us note that if R(0) is proper, then R(t) is proper (Corollary 1). Using differentiation of the above expression with respect to t, it can be demonstrated that R(t) given by (3) is a mild solution (Pazy, 1983) of the following matrix differential equation,
dt - [Q*R(t) + R(t)Q] R(t) + H(t), t > 0 ,
(4)
with a given initial condition R(0). Equation (4) is a matrix differential equation known as the Lyapunov equation (Gajic and Qureshi, 1995). In the context of population genetics, a finite-dimensional equivalent of this equation was first introduced by O'Brien (1982, 1985). We will base our analysis on expression (3), or equivalently on Eq. (4).
3. Mathematical preliminaries In this section we provide the required mathematical background. We assume that the standard elements of the theory of linear operators in Banach spaces and of the theory of probability are known. We recall the basics of the theory of semigroups of linear operators and in particular of the contraction semigroups, including the Hille Yosida Theorem. Then, we consider the semigroups of transition (Markov) operators in l 1 and their generators. Finally we define the tensor products of Banach spaces and linear operators. The purpose is to describe the model of Section 2 using semigroups related to the mutation process in conjunction with the Lyapunov equation (4).
3.1. Semigroups o f operators
A family {T(t), t >_ 0} of bounded linear operators acting in a Banach space L is called a semigroup iff (i.e., if and only if) the following conditions are satisfied:
v(0) = I ,
v ( t + s ) = r(t)V(s),
t,s _> 0 .
(5)
We will restrict ourselves to strongly continuous semigroups satisfying lim

t--*0
r(t)f
= f,
f EL
(6)
and to the case when all the operators { T ( t ) , t >_ 0} are contractions i.e., IIT(t)ll _< 1. Equation (5) implies that all trajectories t ~ T ( t ) f are strongly continuous. Therefore the Riemann integral
Rzf =
/0
e-;~tT(t)fdt,
2 > O, f E L ,
(7)
220
the Laplace transform of trajectories, is well-defined. R;o is also called the resolvent of semigroup T(t). Properties of {T(t), t > 0} have their reflections in properties ofR;~, 2 > 0. The semigroup property (5) transforms into the Hilbert equation
(#-2)RzR~=R,~-R~,
and (6) has its counterpart in lim 2Rff = f . Moreover, by (7), 1 ]IR~It _< ~ .
2,#>0
(8)
(9)
(10)
Condition (8) also implies that the range ~ and the kernel X is common for all R~, 2 > 0, and (9) shows that ~f~ = {0}, i.e. R~, 2 > 0 are invertible. The formula A = 2 - R~-1 defines therefore a closed operator with the domain D(A) = N. The Hilbert equation shows furthermore that the definition does not depend on the choice of 2 > 0. One may show also that
D(A) = { f E L; l i m T ( t ) ft
f exists}
and
A f = lim T(t)f - f
t--+0 I
(11) Moreover, by (9), D(A) is dense in L. We have thus showed the "if" part of the Hille-Yosida theorem (Hille and Phillips, 1957; Yosida, 1980): THEOREM 1. For a densely defined operator A there exists a unique strongly continuous contraction semigroup {T(t), t _> 0} such that (11) holds iff (2 - A) -1 = R;~ exists for all 2 > 0 and (10) is satisfied. Then A is called the generator of {T(t), t _> 0}. Equivalently, a pseudo-resolvent (i.e., the family of operators satisfying the Hilbert equation) is a Laplace transform of a strongly continuous contraction semigroup iff conditions (9)-(10) are satisfied. Yosida's version of the "only if" part of the proof is based on the observation that the searched-for semigroup {T(t), t > 0} can be obtained as the limit, as 2 -+ oc, of exponential functions of the bounded operators A~ = ,~2R2 - )d[:
oe /A t~n
v ( t ) f = lira eA V = !ira V "
J r J
It is worth noting that, because of bound (10),

IleA 'l] =
e-;~tE
n=0 ?/
~ 1 .
221
3.2. Kolmogorov semigroup

Throughout this paper l 1 will stand for the space of all absolutely summable sequences, with the usual norm [](x,),> 11]11= ~ - l Ix,] By el, i _> 1, we will denote the vectors (6i,n)n>lE l 1, where 6i,~ = 1 for i = n, and 0 otherwise. A bounded linear operator P acting in l 1 is called positive if it maps nonnegative sequences into nonnegative sequences. It is called transition or Markov operator iff for all nonnegative sequences (x~)n>l, [IP(x~)~>l [[ = I1(Xn)n>_111"Let us note that any Markov operator is a contraction. DEFINITION 1. Non-negative sequences (n~)~_>I E l 1, such that be called distributions. Let us consider a matrix (qi,j)ij>l satisfying the conditions
II(~L_>III =
1 will
qid >-- 0 for i 7 j, ~ l q i , J = 0, for all i_> 1.

These conditions imply in particular that qi, the ith row of (qi,j)i,j>>l belongs to l 1, for all i > 1. The matrix (qi,j)i,j>l is called a Kolmogorov or intensity matrix. With a Kolmogorov matrix one may connect an operator Q0 acting in l I, defined on a domain consisting of finite linear combinations of basic vectors el, i >_ O, defined by
Qo
xiei
= i=l
xiQoei = i=1
xi'qi
In general, the operator Q0 is not closed and therefore it cannot be a generator of a semigroup. Whereas Q0 is "too small", the operator Q1 defined as Q1 (Xn)n>_l= (~o=lCC qm,nXmln>l, llwhenever ~mOe= 1 qm,nXm is absolutely convergent and (~m=l qm,nXm)n>l E is generally "too big" to be a generator. The question of when and in what sense the matrix (qi,j)ij>l can be the generator of a semigroup is answered in a more involved manner by the following theorem of Kato (Hille and Phillips, 1957, p. 642). THEOREM 2. Given a Kolmogorov matrix (qid)ij>l and the corresponding operator Q0, there exists at least one strongly continuous semigroup acting in l 1 the infinitesimal generator of which is an extension of Q0. In particular, there exists "a minimal solution", a semigroup {P(t), t _> 0} of positive contraction operators such that any semigroup {P1 (t), t _> 0} whose generator is an extension of Q0 satisfies P1 (t) > P(t). P(t) is a transition operator iff there is no non-trivial (e~),_>l c /~such that (qi,j)ij>m (e,)~>l = 2 (c~)n>l , for some 2 > 0. The infinitesimal generator of {P(t), t > 0} will be denoted Q. Note that {P(t), t >_ 0} are transition operators related to a minimal solution process defined by means of (qij)ij>l, as defined in Norris (1997, p. 65), Friedman (1971, Ch. 5) or Chung (1967). We write P(t)x for x = (xn),>lE l 1, even though the matrix notation would require (x~)n>lP(t).
222
A. Bobrowski, M. Kimmel, O. Arino and R. ChakraborO~
3.3. Tensor products

DEFINIa'TON2. Let K and L be Banach spaces. A pair (34, f2) where M is a Banach space and ~2 is a continuous bilinear operator ~2 : K x L --+ M is called a tensor product of K, L iff for any Banach space E and for any bilinear, continuous operator 7 J : K x L--+ E there exists a unique continuous linear operator : M -~ E such that I]~ll = II7~11 and ~ = o f2. The p r o o f of existence of tensor products can be found e.g., in Semadeni (1965) or Defant and Floret (1993). Sometimes, we say that a certain Banach space is a tensor product of K and L without specifying f2; it should not lead to misunderstanding. Tensor products are denoted by K L, and f2(x,y) is denoted by xQy. Note that from the above definition it follows that all tensor products are isomorphic. EXAMPLE 1. (Semadeni 1965, p. T18 or Defant and Floret 1993 pp. 29-30) Let K be an arbitrary Banach space. The Banach space l1 of all absolutely summable sequences (kn)n>_l, k~ E K , equipped with the norm ~ knil is 2.=111 a tensor product of K , l 1, when considered together with the operator e : X Z~ --, l~; e ( k , (x,),>~) = K O (x,),>~ : (xok),~.
]l(kn/n>lll----
P r o o f of this example is shown in the Appendix. Given two bounded linear operators A and B acting in spaces K and L, respectively, one defines their tensor product A B on K L as follows: Set (A B) (x y) = Ax By and note that the set of all elements of the form x y is total in K L, i.e., their linear combinations form a dense set - (Semadeni, 1965, p. T5), so that the operator defined above admits a unique extension. An equivalent approach is to consider a bilinear and bounded operator ~ : K z L---+ K L, given by 7~(x,y) = Ax By, and define A B as the corresponding mapping A B : K L ~ K L such that 7~ = A B o g2. We have
IIm Bll _< IIAlllIBH.

In particular, if {T(t), t > 0} and {S(t), t > 0} are strongly continuous contraction semigroups acting in Banach spaces K,L, then T(t) S(t) is a strongly continuous contraction semigroup acting in K L. Moreover, its generator C equals A I + I B, where A, B are generators of {T(t), t _> 0} and {S(t), t _> 0}, respectively. To be more specific, i f x E D(A) and y E D(B), then x y belongs to D(C) and C(x y) = Ax y + x By, and, furthermore, the set of linear combinations of vectors of this form is a core for C, i.e., for any w c D(C), there exists a sequence w~, n > 1 of linear combinations of vectors of this type such that l i m ~ w,, = w and l i m ~ Cwn = Cw (Nagel 1986, p. 23).
4. The Lyapunov equation and its asymptotic behavior

This is the section in which main results of the paper are presented. In particular, we find the asymptotics of the joint distribution R(t) in the cases of stable,
223
decaying, and expanding populations, given that the Markov process of mutations with transition probabilities P(t) is asymptotically stable. As an introduction, we represent the mild solution (Pazy, 1983) to the Lyapunov equation as an element of a Banach space of absolutely summable infinite matrices.
4.1. Definition and basic properties

Let us consider a Kolmogorov m a t r i x (qiu)i,j>l and a corresponding minimal solution semigroup {P(t), t > 0}. We suppose that {P(t), t _> 0} is Markov.
(qi4)i,j>_l be a Kolmogorov matrix, and let {P(t), t >_ 0} be the corresponding minimal solution semigroup. The semigroup T(t) = P(t) P(t) acting in l 1 l 1 will be called the Lyapunov semigroup related to (qij)i,j>l- The Lyapunov semigroup describes the joint evolution of the allele types of two independent individuals.
DEFINITION 3. Let
Let us note here that it is convenient to represent elements of 11@ l 1 as matrices m = (mi4)~#>1 such that their columns (m~,.)i_> I belong to 11 and ~i=l [[(mi,.)i>_l[[l I = [[(mi,j)i,j>_ll[ <(20 (see Section 3.3). In this notation, for x = (xn)n>_l, Y = (Y,)n>_l E l 1 we have x y = (xiyj)i,j>l, and the action of the Lyapunov semigroup is described by P(t)P(t)m = P*(t)mP(t) (in the matrix multiplication on the right-hand side, the superscript * denotes, as before, transposition of a matrix). Moreover, if we represent a matrix m as the vector of its columns mi, m = (ml~..., mn, . . .)~ w e get [P(t) @ P(t)]m = ~i~_l P(t)mi P(t)ei. We will assume that (q~,])~,j>_l is fixed and denote l 1 11 by M. Consider an injection operator O : l 1 ~ M given by
OO
O(Xn)n>l = Z x i e i
i=1
@ ei ,
(12)
or, in a matrix notation:
O(x.).>l = (6,4x~)i4>_, .
(13)
O is a positive operator with 11011 = 1. It maps a vector into a diagonal matrix with diagonal entries identical to those of the vector. DEFINITION 4. A semigroup {P(t), t _> 0} of transition operators acting in l 1 is called asymptotically stable iff for all distributions (Tz,)n>l E l 1, there exists a limit limt~P(t)(zc~)~>l, which implies that it exists for all-(x~)n>l C l 1. If this limit does not depend on the choice of (7c,),,>1, we say that {P(t), t >_ 0} is ergodic (Lasota and Mackey, 1994). A norm of a matrix m = (mi,j)i,j>l C M is equal t o ~-~i=l Ilm~,.Ib = ~ i ~ l ~ - 1 Imp4[ This leads to identification of M with l 1 and to the following result. PROPOSITION 1. The Lyapunov semigroup is a Markov semigroup. It is asymptotically stable or ergodic whenever {P(t), t >_ 0} is.
OO
224
PROOF. Since {P(t), t _> 0} is Markov, the Lyapunov semigroup is nonnegative, and, for any nonnegative m E M we have:
00 oo
IJr(t)mll
~ P(t)m~ P(t)e~ = ~flP(t)m~ P(t)e~[[

i=l oG oo oo
= ZllP(t)mi[] [IP(t)eill = ~llP(t)mi[I = ~ l l r n , tl = Ilm[]

i=1 i=1 i=1
where mi is the ith column of m. If P(t) is asymptotically stable, we get

O<3
t--+ O0
OG
lim r(t)m = lim Z

t--+ O)
P(t)mi P(t)ei = ~_Pmi Pei = (P P)m ,

i=1
i=1
where PTc = limt~00 P(t)n, by Scheffe's theorem (Billingsley, 1986). If {P(t), t _> 0} is ergodic, P~ = 7c0 does not depend on distribution ~ and we get
oo
T(t)m = t~OO
lim
~llmill~z0 ~0 = ~0 7c0 ,
i=1
as desired.
[]
Let C = Q I + I Q Q be the generator of a Lyapunov semigroup {T(t), t > 0}, (see Section 3.3 above), and let N(t): [0, oc) --+ (0, oc) be a continuous function. Consider an equation in M: d
R(t) ~- CR(t) - 2 ~ ( t ) R ( t ) +2-~(t) OP(t)rc,
R(O) = R o E M .
(14)
This equation is equivalent to Eq. (4) and its mild solution is given by
R(t) = e- oo~V,t,R0 + J o e - 3~2N( ~ IT(t - s ) 2-~(s) OP(s)~zds .

This latter equation is equivalent to Eq. (3).
~i du
ft
f' du
(lS)
COROLLARY 1. If {P(t), t _> 0} is Markov, then for any t _> 0, the operator Ro --+ R(t) is Markov, i.e., R(t) is positive if R0 is positive and IIR(t)IPM-- IIR01tM, for any R0 C M. PROOF. Suppose that R0 _> 0 and IlR01l~= 1. Certainly, by (15), R(t)>_ O. Moreover, by Proposition 1 ' d,, ~ ( R o ) i j + y~ R,:(t) = e- 2~C~(~)
l,J 1,j
f0 t
l e_ f ~ s du Z(O~z)ijds . 2--ff~(u)
i,j
Since ~i,j(Ro)ij = 1 and ~i,j(O~)ij = ~ i 7ci ~-- 1, we obtain
225
Z
zd
Rij(t) = e- i ~
fO
s du
e- f ~
ds = 1 .
[]
In what follows we will generally assume that {P(t), t > 0} is asymptotically stable, although this assumption is not necessary for assertions of Propositions 2, 4 and 5 below. The distinction between the asymptotically stable and ergodic cases will be discussed separately (Section 4.5). The hypothesis of asymptotic stability of {P(t), t > 0} excludes the infinite alleles and infinite sites models as well as the unrestricted stepwise mutation model. However, the first two models have been considered in detail by others (Tavar6 1984, 1995), and the stepwise mutation model will be considered separately in Section 5. On the other hand, most models with finite number of alleles are included, in particular, the restricted stepwise mutation model (Section 4.6.1). We will consider asymptotic behavior of R(t), as t ~ ~ , in the following three cases,
N(t)--+N,
N(t) N(t) O, oc.
0<N<ec,
4.2. Stable population

The following proposition guarantees the existence of the limit R(ec) and provides its form. Pkoeoslw~oy 2. Let {P(t), t _> 0} be asymptotically stable, let rc = (~,),_>1 E l 1 be a distribution, and denote Pro the limit l i m ; ~ P(t)(Tc,),>_ 1. If N(t) ---+N, where 0 < N < oc, then for anyR0 the solutions R(t) of (14) tend, as t ---+oc, to ,~R;OPzc, where 2 = 1 and R~, 2 > 0 is the resolvent of the Lyapunov semigroup.
t du PROOF. Under our assumptions l i m t ~ f0 av ~ z~v ~u) = oc, so that the first term in (15) vanishes. Let us change the variable in the second term to obtain
r d, ft r' d. 1 R(t) = e - d o ~ T ( t ) R o + _./~ e-3, szN(,-~T(s)2N(t - s) OP(t - s)n ds

t du [oo dO t du
(16) (17)
= e-f~T(t)Ro +
e - f YC~(~T(s)F(t- s)ds ,
where we have set f ( s ) = ~ OP(s)~, s >>0 and f ( s ) = 0 otherwise. We have t du limt~oo F(t - s) = 20Pro, and h m t _ ~ f~ s 2~-N7(~) = 2s. Moreover, the function N(t) is bounded by assumption, so that the norm of the integrand in (17) is bounded by const e -cnstxs, and the proposition follows by the Lebesgue dominated convergence theorem. [] In matrix notation, the assertion of Proposition 2 can be explained as
=
f0
e - ~t P , (t)dlag[P~]P(t)dt , .
(18)
226
where diag(x) = O x denotes a diagonal matrix with vector x on its diagonal. In the remaining portion of this section, we will develop computable representations
of R(oo).
Computation of the limit R(oc) is a nontrivial problem. One approach is to set
R(t) = R(oc) in the right-hand side of Eq. (14) and then equate it to 0. This leads
to the equation
R ( o o ) = ( Q * R ( o o ) + )cm)(,~ - Q)-I ,
the solution of which logically might be the limit of the following iteration
S~+1 = (Q*S~ + 2m)(2 - Q)-~ ,
(19)
with So = 0 and m = OP~z. Unfortunately, we are not able to demonstrate convergence of (19) in the general case. Numerically, in certain finite dimensional cases (Section 4.6.1), convergence is observed. In the general 2-dimensional case, we have
-v
'
'
and explicit calculations demonstrate that S,,+1-Sn behaves as (# + v + 2)] ~, as n --+ oc. In general case, we are able to demonstrate convergence of another First, we need a Proposition. Let us note that if generator Q is a operator there exists a constant k such that Q = 1Q + I is a transition (in 11). Indeed, it is enough to take k = maxi>_l(-qii). Therefore, e ot =- e-ktekOt. As a consequence we obtain the following result.
[(#+v)/ iteration. bounded operator we have
PROPOSITION 3. I f Q is a bounded operator, then there exists a constant k such that, for all (mi,j)i,j>l c M,
(3O
=
n=0
+ k-
(Qk)
"(mi,j)i,j>_l .
PROOF. In the Appendix. For the series of Proposition 3 the exact rate of convergence can be given. For all nonnegative (mi,j)i,j>l, w e have,
(~ __ c)-l(mij)&j>_l
__ ~[(.~_pk_Q)-l] N n~O nl@ (Pk) n (mi,j)i,j> ,_1
(~)
(mi,j)i,j>i
227
oo
@ (Pk) (mi,j)i,j> 1
~- r i l l
= ~
[(2 + k kn
Q)-']'+'
__ k N+I
" 1 1 ff k "~N+I '

(20)
n=N+l (2-Fk) "+1
(). Jr-k) N+2 1 _ ~ k = ~ ~ @ - k f l
since 0 and (2 + k - Q)-I are positive.
Note. The series in Proposition 3 is the limit of the following iteration:

817+1 z (2 -}- k --
Q*)-I[sn(Qk ) + 2m] Q*)-I[S,(Q+k) +2rn] ,

(21)
= (2+k-
with So = 0. Inequalities in (20) demonstrate that iteration (21) converges geometrically as i7~ not be very fast. . If k has to be selected large, then this convergence will
4.3. Decaying population

The following proposition demonstrates the depletion of diversity in decaying populations. As a consequence of this result, all individuals in the population will tend to share the same allele. Although this finding is intuitively obvious, its proof is not trivial. In this proof, and in the proofs of several results further on, an important role will be played by a family of functions gt('), t > 0, defined as
gt(s) = ~ 2N(--977~ e J, ,~N(,I, 0 < s < t ,

[. 0, otherwise , where t denotes the present time, while s is time measured backwards from the present, and gt(s) is the density of the time Ta to coalescence (Section 2.1).
_f'
d~
PROPOSITION 4. Let {P(t), t > 0} be asymptotically stable, let (Tgn)n> 1 C l 1 be a distribution, and denote P~ the limit limt_~oo P(t)(rcn)n>~. IfN(t) -+ 0, t-hen for any R0 the solutions R(t) of (14) tend, as t -+ co, to OPn.
PROOF. Our assumption on N(.) implies that limt-+ooe J ~ = 0, and again, as in Proposition 2 we infer that the first term in formula (15) vanishes. Therefore, we can focus on the asymptotic behavior of the second term. Using functions gt(') we may rewrite the formula (16) as follows:
t du R(t) = e- f2~7~(~)r(t)Ro + ~0 t gt(s)r(s)OP(t - s)~zds .
(22)
Our proposition is a consequence of the fact that the measures/~t, t > 0 on [0, co) with densities gt(') approximate, as t --+ co, the point measure concentrated at 0,
228
i.e., most coalescence occurs close to the present time. Indeed, let us note that
t du
gt(s) = - d e- f~_~r~u~.Therefore,
fo~ gt(s)ds= ft
gt(s)ds= - N e
[ d_a. 1t ",-,~')J0=l-e
f;~
tdu
'-V2 1 .
(23)
Analogously, for any 6 > 0, foo #t(s)ds= f6t
Ot(s)ds= [ - ~ s e J'-Y(~)J~
F d
f~ ~ ] t
d. - e-f~>-~(.) = e f_6~-~u) ' du tU-~ 0
(24)
Now, let us fix s > 0 and choose to such that, for t > to, IIP(t)rc-P~II< e. Let c5 > 0 be such that, for 0 < s < c~, [I T(s)OP~ - OPz~ [[< e, and take t > to + 3. For s E [0, c5] we have [IP(t - s)rc - Pro(l< e, since t - s > to. Denoting R2(t) the second term in (22), we get []R2(t)- OP~II <_ fotgt(s)T(s)OP( t - s ) ~ r d s - fotgt(s)OP~ds
+ e-~llOP~ll <
t du
//
gt(s) OF T(s)O[P(t - s)~ - P~] lids
S gt(s) lfT(s)O[P(t - s)~ - Prc]lids

-
+ ~6 g,(s)II r(s)OPTc +
<_2
Letting t -~ oo, we get lira supllR2(t) - oP~ll < RE ,
g---+OO
OP~llds
t du
~t
a,(s)llr(s)OP=-PTcllds+e-f~w~lfoP=lj
gt(s)ds g + 4 gt(s)ds + e- a,
which completes the proof.
[]
4.4. Expanding population

For asymptotically stable non-ergodic P(t), the results differ depending on whether population growth is slow (integral r td , diverges) or it is fast (integral J0 2N(u) f~ ~ converges). The asymptotically stable, non-ergodic case is important when substructured populations are considered (Section 4.5). In the ergodlc ca e, w e
- - - ' - " " " S
229
obtain the star-shaped coalescent, i.e., the allele states of any two c h r o m o s o m e s are independent. PROPOSITION 5. Suppose N(t) --+ oc in such a way that limt-~o~ J0 2N(u)- 0(3. If {P(t), t > 0} is asymptotically stable, then for any distribution ~ = (~i)l<i_<k c R k, and for any R0 E M, lira R ( t ) = T [ O ( P ~ ) ]
,
~t du __
where T = P P and P = limt~o~ P(t). PROOF. Let us consider formula (22) and note that the functions gt('), t > 0 have n o w different properties. The measures /at, t > 0 with densities gt('), t > 0 "escape to infinity", however, "slower than t". The effect is that for any 6 > 0, intervals [0, 8] and I t - 3, oo] are asymptotically of zero measure /at, as t--+ oc. Indeed, we can prove that (23) still holds but instead of (24) we have, for any 8>0, lira
t--~OO
gt(s)ds = 1 .
(26)
and
lira J,_a gt(s)ds = 0 .
(27)
Also, we note that, as in Proposition 4, the first term in (22) vanishes, and we can restrict ourselves to studying asymptotic behavior of the second term, R2(t). Fix e > 0 and choose to such that, for t > to, and u _> 0,
]lT(t)OP(u)Tc - TOP(u)~ll < e ;

this is possible, since the image of the function [0, oc) ~ u ---+ @P(u)~ is c o m p a c t in M. Let tl be such that for all t > q, IIP(t)~ - Prcll < ~. F o r t > to + h, reasoning as in (25), IIR2(t) - rOP~]l
< fot gt(s)T(s)OP(t - s)~ds - f o gt(s)TOP~cds
+ e- f~L]TOPrcl[ ' d,,
--
it--t]
+ e-fg
_< 2
t du
IITOP]I
+
fotO g;(s)ds tl
[t--tl 9t(s) II T ( s ) O P ( t - sfiz - TOP~zl]ds d to
+2
9t.s.ds-be-~ )
IITOP~II
(28)
230 We have
IIT(s)Op(t - s)~ -
TOPn[] < I I T ( s ) O P ( t - s ) n - T O P ( t - s)nl[
+ I[TOP(t- s)=- rOe ll
For to < s < t - tl, the right-hand side of the inequality above is less than 2e, so that the last expression in (28) is less than
2
,~0togt(s)ds
+ 2e + 2
i~
to
gt(s)ds + e f2-~ I I T O P n l l
t du
and, letting t ~ ec, and using (26)-(27) we get l i m s u p t ~ o ~ H R z ( t ) - T O P n l I < 2e, as desired. [] REMARK 1. In the matrix notation, the assertion of Proposition 5 has the form
R ( o c ) = P*diag(P~z)P .
PROPOSITION 6. Suppose N ( t ) ---+ oc in such a way that limt-~oo f0 2N@,)= e < oc. If {P(t), t > 0} is asymptotically stable, then for any distribution rc --" (~i)i>l, and for any R0 c M, lim R ( t ) = e ~TRo + T O
t---+ O 0 e~ du
P ( s ) ~ d/~(s)
(29)
1 o-f~ >~,) so that/~([0, oc)) = where # is a measure on [0, oc) with density 2W~(~)~
1 e -~.
PROOF. Let us look again at (22). Under our assumptions the first term tends to e-~ TR0. Moreover,
9t(s)ds = 1 - e -~
and, for 6 > 0, lim f o ~ 9t(s)ds =- 1 - e -~ so that the effect of "escape to infinity" is observed once again. This allows us to show, as in Proposition 5, that )im R2(t) - T O ~o t g t ( s ) P ( t - s ) ~ d s
= 0 . ds and note that
~o
Now, rewrite Jo g t ( s ) P ( t - s)~z ds as f~ 2@(~)e- ~ P ( s ) r c lim I ~ e t ~ J o 2N(s)

rt 1
J~P(s)~ds
r'~u
= Jo
[~
2N(s)
~'aP(s)~ds
231
In order to prove this relation, fix e > 0 and choose to large enough so that
eo du
1-e For t > to, Lt
f02~(~/<~ .
1 _ t d, --e f~-~(~P.s ) 2N(s)

t du
<l-e
du 02N(u) <
8 .
On the other hand, since e- f,>-~(~ is a decreasing function of t,
)im ~
flO t
2-~(s) e-
f~-vc~p(s)=ds
tdu
. ~ t
2~(s) e-
~d"
~-v~(~p(s)xds .
Combining last three relations we get

t 2-~(~ du P.s ( j ) z ds - ~000 lim sup fO t - -1 e- fss t~co 2N(s)
2~(S)
f, ~ P ( s ) = d s
ec d
< 2e
--
as desired. Thus (29) holds with # defined as the measure with density g(s) =
d - j~ F~-du 2N(u). 21( s) e j ~ d ~ Since g(s) = Ne 2N% we have also f0co dp =
1 e -~.
e - f~ 2N(u~
= 0
[]
REMARK 2. In matrix notation, assertion of Proposition 6 assumes the form
R(oo) = e - ~ P * [ R o + d i a g ( f o c o P ( s ) r c g ( s ) d s ) l P
4.5. Ergodicity versus asymptotic stability." Population substructure

In the case where the semigroup {P(t), t _> 0} is ergodic the limits appearing in Propositions 5 and 6 coincide. In both cases, if only N(t) --+ cc
R(oo) = Tm = P~ P~ ,
which means that the allele states in both individuals are independent. This is, however, not the case when {P(t), t >_ 0} is asymptotically stable and non-ergodic. The principal question arising is whether the asymptotically stable but nonergodic case is of importance for applications. The circumstances under which it may arise are those of a finite or infinite number of non-communicating classes of alleles. An alternative setup is a population with an internal structure, composed of two or more varieties that did not communicate in the past. The example presented in this section concerns two non-communicating classes of alleles or two subpopulations. Suppose that l 1 can be decomposed into a direct sum of two subspaces, say, (I1) 1 and (ll) 2 in such a way that there exist two ergodic Markov semigroups {P,.(t), t _> 0}, i = 1,2 acting in these subspaces and P(t) = ~ - 1 Pi(t)Oi, where Oi
232
are projections on (l 1)i i = 1,2. Also, suppose that each of the ej vectors belongs to either of the two subspaces (/1)i i = 1,2. By (re*)i, i = 1,2, let us denote the unique distributions invariant for {P/(t),t_> 0}, i = 1,2. Moreover, let (R.)i,j = (~.)i Or.)j E (ll) ~ (ll) j, i,j = 1,2. Finally, let us define (only in this oo section) a functional S: l 1 ~ R, by the formula Sx = ~ i = l x i , where x = (xn)n>l, and related functionals S(i) = SO~, i = 1,2 (acting in l l) and S(i,j) = S(i) S(j'), i,j = 1,2 (acting in M = 11 ll.). Functionals S(i), i = 1,2, are summing all the elements of the distribution vector belonging to subspace (ll) 1 and (ll) 2 respectively, while S(i,j), i,j = 1,2 sum elements in the respective blocks of the distribution matrix. We have: (i) Px = l i m t ~ P ( t ) x ~=l[S(f)X](7"C*) i, for any x l 1, (ii) TRo = ~ij_l,z[S(i,j)Ro](R*) ~J, for any R0 E M. Indeed, it is enough to prove this assertion for R0 = x y, and we have
=
= Z
i,j=1,2
[S(i)x][S(j)y](rc*)i (rc*)J = Z
i,j=1,2
[S(i'j)R](R*)i'J
"
(iii) I f x C (ll) i, then Ox E (ll)i (ll)i and S ( i , i ) 8 x = S(i)x, i = 1,2. (iv) For any zc E 11,
TOPaz
[S(i)~]O(7~*) ~
}
=
i=1
{s(i>j
(R*) i'i
(30)
the slow-growth limit of Proposition 5. (v) For x E l 1, S(i)Pi(t)x = S(i)x and, therefore,
S(i)
/0 /o
P~(t)xdt~(t) =
/0
S(i)Pi(t)xdp(t) =
/0
S(i)xd#(t)
= (1 - e-~)S(i)x , f o r / = 1,2. (vi) For any rc E l 1,
TO
P(t)~zd#(t) = T
/o
0
i=1
P,.(t)Oi~d#(t)
=
i=1 v
S(i,i) 0
Pi(t)Oizcd#(t)
(R*) i'i
--
(1 - e ~
S(,>](R*)'"
233
(vii) By steps (ii) and (vi) the limit in Proposition 6 equals

e ~ E [S(i'j)R](R*)i'J + (1 - e ~) [S(t)rc](R*) ~''
(31)
i,j=l,2
Comparing expressions (30) and (31) we see that expanding populations may differently retain memory of their internal structure at time 0. Under slow growth, only the overall proportions S(1)~ and S(2)7c in classes 1 and 2 are remembered. Under fast growth, the joint frequencies S(1,1)R0, S(1,2)R0, S(2,1)R0 and S(2, 2)R0, are also remembered. Also, let us note that results presented above are readily generalized to a greater number of subspaces. If l 1 is decomposed into a direct sum o f k subspaces, then the above formulas remain valid, except that indices i and j vary from 1 to k.
4.6. Other examples 4.6.1. Restricted stepwise mutation model
The Restricted Stepwise Mutation Model (RSMM) is an important version of the unrestricted Stepwise Mutation Model considered at length in Section 5. In the RSMM the mutation process can be represented as a random walk on set {0, 1 , . . . , K } with reflecting boundaries at states 0 and K. The corresponding intensity matrix has the form d -1 b
Q~V
~0
"',
""
d
""
-1 d
where b and d are the probabilities of forward and backward steps, respectively, while v is the overall mutation rate. Deka et al. (1999) used this model, together with the iteration defined by expression (19) to test neutrality of microsatellite loci. Detailed computations can be found in Deka et al. (1999).
4.6.2. Joint evolution of a disease locus and a linked marker locus
The spread of genetic disorders in a population can be modeled using the FisherWrigh~Moran model. If it is assumed that the disease allele is rare, simplified forms of the model can be used. It is possible to follow not only the disease locus, but also markers linked to it. The model can be used to estimate the recombination fraction between the disease and marker loci, jointly with the demographic history of the disease population. Preliminary results are available in Pankratz (1998). We concentrate here on modeling of spread of a rare disease. We take into account an exponentially expanding disease population, embedded in an infinitely large normal population remaining at equilibrium. The basic set of assumptions
234
listed below describes the variant of the process in which recombination between the disease locus and a marker is the dominant force (as in Kaplan et al., 1995): The population of normal chromosomes is much larger than the subpopulation carrying a disease mutation. The k alleles at the linked marker are present in the normal population at constant frequencies pj,, j = 1 , . . . , k. At some time in the past, a single disease chromosome appeared in the population. The marker locus is linked to the disease locus, with a recombination rate of r. The subpopulation of disease chromosomes is growing exponentially, i.e., X ( t ) = No exp(2t).
R e c o m b i n a t i o n only. With the assumptions above, we obtain the probability that
a disease chromosome with marker allele i gives birth to a disease chromosome with marker allele j. Since the only way that the marker allele on a disease chromosome changes type is through recombination, the probability that a disease chromosome of type i produces a disease chromosome of type j, where i ~ j, is equal to the probability that the disease chromosome recombines with a chromosome of type j. This probability is rpjn. Likewise, the probability that a disease chromosome with marker allele i gives birth to a disease chromosome of the same type is 1 - r + rpi~, i.e., the chromosome either does not recombine or it recombines with a chromosome carrying the same marker allele. The resulting transition matrix, has the form
Mr = [ ( 1 - r)I + rP] ,
(32)
where I is a k x k identity matrix and P is a k x k matrix with identical rows, each of which is equal to p = (Pl~, ... ,p~). The corresponding intensity matrix Qr is equal to Mr - I , so that
Qr = r(1 -- P ) .
(33)
R e c o m b i n a t i o n with m a r k e r mutations. We now assume that mutation events at the marker locus occur before the recombination events and independently of them, with the probabilities defined by a mutation matrix U. Consequently, we can write the transition matrix as Mmr ---- r ) I + rP] (1 - r ) U + rP , (34)
because the rows of U sum to 1 and all the entries in the columns of P are equal. We can explain the result as follows. If no recombination event occurs, then any mutation event will persist in the offspring. However, if a mutation event is
235
followed by recombination between the disease and marker loci, then a mutation at the marker locus is lost to the disease chromosome. The corresponding intensity matrix Qmr is equal to
Qmr = (1 - - r ) U + r P - - I .
(35)
Equation (4) can now be employed to calculate the first and second moments of the empirical frequencies of marker alleles in the disease population. These were used to estimate the recombination fraction r from marker alleles frequency data (Pankratz, 1998).
5. Stepwise Mutation Model with variable population size

The Stepwise Mutation Model (SMM) assumes that an allele at the locus considered is subject, at time epochs of a Poisson process with intensity v, to mutations which replace an allele of size X by an allele of size X + U, where U is an integer-valued random variable (r.v.) with given distribution ( P n ) n j , where Z is the set of integers. The SMM was originally conceived to study electrophoretic variation at protein loci (Ohta and Kimura, 1973). More recently, it has been employed to model the evolution of the so called tandem-repeat loci, i.e., regions of DNA with periodically repeated motifs. These loci are referred to as microsatellites if the repeat motif is short (2-6 nucleotides). An allele at such locus can be characterized by the number of repeat units (X) i.e., by a nonnegative integer. Because of their abundance in the genome and extensive polymorphism even in small populations, these loci became popular as markers for gene mapping, identification of individuals, and phylogenetic analysis, this latter in part because of high rate at which new alleles are generated (Weber and Wong, 1993). It has been shown empirically that mutations of such loci, particularly at a short time scale (in evolutionary terms) consist mostly of contractions and expansions of the number of repeat units X. Thus, the SMM is considered a realistic description of the mutation mechanism of microsatellites (Goldstein et al., 1995; Jin and Chakraborty, 1995; Shriver et al., 1993; Slatkin, 1995). In the present section, we provide a systematic mathematical description of the distributions of pairs of alleles described by the SMM under genetic drift. Since the mutation mechanism assumed is essentially an unrestricted random walk, the marginal distributions are unstable, i.e., P(t) does not reach any nontrivial limit. However, it has been known (Moran, 1975) that when the allele sizes X are compared pairwise, some statistics of repeat variation attain a mutation-drift equilibrium in populations. We will show that this is the case for a generalized SMM with arbitrary distributions of the allele-size change by mutation (as in Kimmel and Chakraborty, 1996) if the population size is stable. In addition, we will show that for populations expanding fast enough, a central limit theorem holds for differences in allele sizes.
236
5.1. M a t h e m a t i c a l preliminaries
In this section we will treat the space l 1 introduced in Section 4, as a space of doubly-infinite sequences (x,)~cz such that = ~ , ~ _ - ~ IXnl < oo. The space M = l ~ 11 will be identified with the space o f all doubly-infinite matrices (mij).i,jcz, such that < o~. Recall that l ~ defined in this way is a commutative Banach convolution algebra with multiplication x * y = ( 2 i ~ _ o o X i Y n _ i ) n E Z , w h e r e x = (x~)ncz, y = (Y,),~z" F o r x = (xn),ez E l 1, let Y denote ~ = (x_~),cz. Observe that x y = x * y. Let U be an integer valued r a n d o m variable with p~ = Pr[U = n] and let P = (P~),cz. Consider the b o u n d e d operator Q : l I --+ l 1, Qx = vx p - vx, where v is a positive constant. As a matrix,
II(x.)o zll
I',
Q=v
i
I
P-2
P-1
-- ~-~nOPn
Pl
"'"
p-2
"'"
P-1
P-2
-~noP~
P 1
Pl ~n0Pn
P2
""
Pl
P2
:ii)
This operator describes the transition intensities of the generalized S M M . The operator exponential function P(t) = e Qt is given by
P ( t ) x = e - % ~pt * x
(36)
where e 'pt = ~ - 0 T (~tp)*" E l 1 and the superscript *n denotes the nth convolution power (3O Let D : M - - + l 1 be given by D ( m i j ) i j ~ z = ( ~ i = oomi,,-~)ncz. If m is a joint distribution of two integer-valued r.v.s, then D m is the distribution of their difference. LEMMA 1. F o r x , y 11, we have D ( x y ) =x,f.
PROOF. Let x = (xn)nez, Y = (Yn)ncz" D ( x y) = D(xnym)n,meZ =

\k=-oc
Xkyk-n
/ nCZ
=
\k=oo
xky-(~ k)
/ ncZ
=x*~. []
LEMMA 2. F o r m C M , D{P(t) P(t)Jm = e vO+#-2) (Dm). PROOf. A detailed analytical p r o o f has been postponed to the Appendix. A n intuitive probabilistic p r o o f proceeds as follows: Element m E M is a joint
237
distribution o f two Z-valued r.v.s, 321 and)(2 Action o f P ( t ) P(t) on m a m o u n t s to addition of a pair of independent identically distributed r.v.s Y1 and Y2. Therefore [P(t) P(t)]m is a distribution of (X1 + Y1,X2 + 112). But D I P ( t ) P ( t ) ] m is the distribution of (X1 + 111) -- (X2 + I72) = (X1 - X 2 ) (I11 - Y2), so, since (XI,X2) and (Y1, Y2) are independent, it is equal to the distribution 0(1 - X z) convolved with that of (Y1 - Y2), i.e., (Dm) * exp[vt(p - 1)] exp[vt(15 - 1)]. [] LEMMA 3. F o r any distribution rc E 11, D O z = eo.
PROOF.
k=-oo
/ ncZ
Putting together L e m m a s 1-3, we conclude that if R(-) is the solution to (14) where P(t) is given by (36), then S(t) = DR(t) is the solution to the equation dS(t) dt 1 -- (p + ~fi) * S(t) - 2S(t) - 2 ~ 7 ~ S(t) + 1 S(O) = So = DRo . Moreover, by (15) we have (37)
du ~(p+p-2)(t-') ^ 1 e0 ds S(t) = e- f e v(o+p-2)t * So + fO t e - f~t~("~e '~ d, ^ 2N(s)

and, equivalently, by (16), S(t) = e - f ~ e v(p+p-e)t * So +
t au . ev(p+p_2) f0 t e - f t , ~du ^ s
1 2 S ( t - s) eo ds . (38)
5.2. Stable and decaying population Proceeding as in Section 4 we obtain the following results. PROPOSITION 7. I f N(t) ---+N, where 0 < N < c~, then for any distribution So the solutions S(t) of (37) tend, as t -+ oc, to D,2RxOeo, where 2 = 1 and Rx, 2 > 0 is a resolvent of the L y a p u n o v semigroup. We have also D2R~Oeo = 2 e-~Se ~(p+~-2)s * e0 ds . (39)
~0
PROPOSITION 8. If N(t) ~ O, then for any So the solutions S(t) of (36) tend, as
t --+ 00, t o eo.
238
Concerning Proposition 7, the limit distribution provided by expression (39) is written as an element in the space of doubly infinite sequences nonnegative and summable to 1, i.e., distributions of Z-valued random variables. The solution S(oc) may be rewritten in the terms of characteristic functions (c.f.s) or probability generating functions. This latter form of S(oc) can be found in Kimmel and Chakraborty (1996), as their Eq. (4). Concerning Proposition 8, its assertion states that the decaying populations loose genetic diversity. This is intuitively expected and the result is provided for completeness only. Note that the assumption of stability of P(t) is not needed now.
5.3. Expanding population

Suppose that m = (mkl)kdcz E M is the joint distribution of a pair (X, Y) of Z-valued r.v.s. Let us define the corresponding c.f. as
OO . .
~m( z, ~) =
~
k=-oo l---oo
mklelkze 11~ ,
where i 2 = - 1 , r, ~ E R. Analogously, for a single Z-valued r.v. X with distribution x = (xk)kcZ, let
Ox('C) = ~
k=-oc~
xke ik~ .
Let us note that x y is the distribution of (X, Y) when X and Y are independent. We note the following result. LEMMA 4. For any fixed ~, 4, M ~ m --+ Ore(z, 4) and l 1 ~ x --+ Ox(z) are bounded, linear functionals with the norm less or equal to 1. Moreover, for x = (Xk)kcZ,
y = (Yk)k z, Oxy( , 4) = = 4) = + 4) , ,
(40) (41) (42)
Finally, if m E M and {T(t), t _> 0} is the Lyapunov semigroup related to P(t) given by (36), then 0r(t)m = exp{vt[@(z) + 00(4 ) -- 2]}~tm(Z, 4) PROOF. In the Appendix. Formulae (40) (42) express the following facts: c.f. of a pair of independent r.v.s (X, Y) is the tensor product of c.f.s of X and Y; c.f. of the sum of independent r.v.s is the product of their c.f.s; and that the joint c.f. of X and Y, when X = Y, is equal to the c.f. of X with a transformed argument. Expression (43) (43)
239
illustrates the compound Poisson nature of the mutation process in the SMM model. PROPOSITION 9. Suppose that N(t) --+ oo in such way that J0 coo 2N(u) du - - ~ < OO and let P(t) be given by (36). Assume furthermore that the c.f. (p(t) of distribution p satisfies limh-~0 2loP(h) - 1]/h 2 =: qo"(0) (we have ~0(t) = q/o(~)). Let ~ be a distribution and let R0 E M. Let Z(t) be a random vector with a (joint) distribution R(t), where R(t) is a solution to (14). Then Z(t)/v/t tends weakly to a twodimensional normal distribution with the covariance matrix
0)
"
PROOF. In the Appendix. COROLLARY 2. Let So E l I be a distribution and let X(t) be a random variable with the distribution S(t) where S(t) is a solution to (37). Under conditions of Proposition 9, X(t)/v5 tends, weakly, as t + oc, to the normal distribution N[0, -2vqo"(0)]. PROOF. Note the relation
= ,
which can be directly checked. Since S(t) = DR(t) where R(t) is a solution to (16) with DRo = So, we obtain that the characteristic function of X(t)/v/t equals
= = -t/G) ,
which tends to exp[vq~"(0)z2], by Proposition 9.
[]
However, in the above Corollary, the condition on c.f. (p of distribution p is overly restrictive. It implies in particular that EU = 0. A more general result follows. PROPOSITION 10. Under hypotheses of Proposition 9 and Corollary 2, let us drop the assumption concerning qo and let us suppose instead that
h+0
lim2[c}(h)- 1]/h 2 =: ~"(0) ,
exists, where O(t) = [~o(t) + (p(-t)l/2 is a characteristic function of the symmetrized distribution (p +/5)/2. Then the assertion of Corollary 2 remains unchanged. PROOF. We have
1//exp[v(p+/5_2) ]('c) =
exp{Zv[gp(z) - 1]} ,
240
A. Bobrowski,M. Kimmel,O. Arinoand R. Chakraborty of X(t)/v/t is given by
and, by (38), the characteristic function exp [ -
it
du ] exp{2vt[gp(~/v/7)_ 1]}
el}as (44)
J0 2N(u)J +
Jot2-N~(s) 1 [_ r t du ]exp{2v(t_s)[~o(mlviT)_ exp J~ 2N(u)J
We also have (49) with ~0 replaced by ~, and the second exponent under the integral in (44) is bounded by 1. Therefore, the rest of the argument can be carried out as in the proof of Proposition 9. [] REMARK 3. The fact that 0e~p[~(p+~-2)](~/v/~) ---+exp[vz2~'(O)] is a special case of the central limit theorem with random number of terms. Results of this section rephrase and generalize older results for the SMM, including those by Moran (1975), Roe (1992) and Pritchard and Feldman (1996).
6. Discussion
As mentioned in the Introduction (Section 1), the main purpose of the paper was to prove the limit properties of measures of genetic variation under the FisherWright-Moran coalescent model under a Generalized Stepwise Mutation Model (GSMM) in which the assumptions of constant population size, restricted allele sizes and contraction/expansion bias of mutations are relaxed. The main results, as discussed above, specify conditions under which limiting properties hold and the rate at which the limits are reached under the mutation-drift balance in finite populations (with a known history satisfying the conditions under which limits exist). The formal proofs of the results demonstrate the applicability of theory of semigroups of linear operators in analyzing population genetic data. A finite-dimensional equivalent of Eq. (4) was also introduced and studied by O'Brien (1982, 1985). O'Brien derived (4) from the time-discrete Fisher-Wright model, using a diffusion approximation of time continuous process by a sequence of discrete ones. His analysis thus involved in a hidden way the process of coalescence (for two individuals) as viewed by Kingman (1982a). Since the coalescent is well-described and widely known, we thought it more natural to take it as our starting point. Furthermore, O'Brien's analysis is strictly finite-dimensional and his formulae for solutions of Eq. (4) in terms of eigenvalues of Q do not seem to have natural generalizations in infinite-dimensional spaces. Finally, his main focus seems to be restricted to homozygosity and number of segregating sites. We focus on the analysis of asymptotic behavior of solutions to Eq. (4), with applications summarized further on. We recapitulate the significance of the main results in terms of their population genetic applications for studying between- and within-population genetic variation as well as for studying disease-gene associations. First, recall that the Lyapunov equation introduced in Section 2, Eq. (4) plays a pivotal role in our
241
analysis. The Lyapunov semigroup is a Markov semigroup and it is stable or ergodic if and only if the mutation semigroup P(t) is stable or ergodic. These results constitute the basis of applying the resultant Lyapunov differential equation to obtain expectations of statistics based on allele frequencies at microsatellite loci. Asymptotic stability does not hold for some mutation models (e.g., the Infinite Allele Model or the Infinite Site Model) for which alternative coalescence-based methods of data analyses are discussed by others (e.g., Tavar6, 1984, 1995). The distinction of ergodicity and asymptotic stability of the {P(t), t >_ 0} semigroup is important, since it includes the case of populations with substructure defined by closed classes of alleles. In Section 4, we obtained the limiting properties (and rates of convergence) of the Lyapunov equation under the assumption of asymptotic ergodicity, which can be directly used to study the dynamics of genetic variation at microsatellite loci in populations of varying effective sizes. Thus, these results can be applied to obtain signatures of past demographic history from molecular data on allele size distributions in extant populations (Kimmel et al., 1998). These results are also useful for neutrality tests of microsatellite loci and to examine, using our Restricted Stepwise Mutation Model (RSMM), whether or not the effects of allele size constraints differ across microsatellite loci of different repeat motifs (Deka et al., 1998). Second, these results also establish conditions under which signatures of past demographic changes on genetic variation in an extant population can be detected. In our earlier work (Polanski et al., 1998), we showed that such signatures, obtained from analyses of DNA sequence data, are quantitatively ill-specified in the sense that several different demographic models may equally well explain the distribution of nucleotide diversity within a population. In contrast, applications of the present theory to microsatellite data (Kimmel et al., 1998) illustrate that the differences between the time and amplitude of population expansion among the major population groups of human can be statistically detected. Third, the situation when the semigroup {P(t), t _> 0} is stable but non-ergodic is also relevant for studying two important population genetic questions: The effects of population structure on genetic variation and the dynamics of diseasegene associations within finite populations. In these two frameworks, alleles (or haplotypes) within populations can be subdivided so that they are classified as members of non-communicating groups, such as alleles of different subpopulations without any gene migration among them, or haplotypes that are on diseasemutation bearing chromosomes as opposed to wild-type normal chromosomes. Thus, the results developed in Section 4.5 are equally applicable for population substructure analysis as well as for studying joint evolution of genetic variation at a disease locus and at a linked marker, the latter considered in Section 4.6.2. In principle, the discussions on allowing for recombination between the disease and marker loci are also applicable for population substructure analysis when the subpopulations are partially isolated (i.e., there is occasional gene migration between subpopulations). Finally, we note that the propositions proved in this paper reiterate our earlier assertion that certain measures of genetic variation (e.g., heterozygosity or genetic
242
distance based on non-identity of alleles drawn from different populations) estimated from data on microsatellite loci are unaffected by contraction/expansion mutation bias (Kimmel et al., 1996; Kimmel and Chakraborty, 1996; Chakraborty and Kimmel, 1998). Thus, the assumption of symmetry of expansion and contraction mutations is not critical for analyzing genetic variation based on measures such as heterozygosity or allele size variance. In this sense, the rationale that the theoretical formulation of Slatkin (1995) is mathematically equivalent to that of Chakraborty and Nei (1982) is also analytically demonstrated through the semigroup operator approach discussed here.
7. Acknowledgements
We thank an anonymous referee who informed us of the papers by O'Brien (1982, 1985) and contributed a number of helpful remarks. This work was supported by US Public Health Service research grants G M 41399 and G M 45861 (to RC) and G M 58545 (to RC and MK) and by the Keck's Center for Computational Biology at Rice University (MK). AB is on leave from the Department of Mathematics at the Lublin Technical University, Poland. M K did part of his research in April and May 1998, when he was a long-term visitor at the Gothenburg Stochastic Centre, supported by the Swedish Foundation for Strategic Research.
8. Appendix
8.1. Proof of Example 1

Let E be a Banach space and T: K x l 1 --+ E a bilinear continuous operator. Define ~b: llx --+ E by the formula ~b(k~)~_>1= ~
n=l
7~(k,, e,) .
(45)
The right-hand side is absolutely converging, since IIT(kn, en)l[ -< a m 7/11]lk~l[a m en I] = II~'lllPk, lI, and we have II~ll _< II~'ll. On the other hand,
Oo (X9
n=l
n=l
and I]~ll = sup I]~(w)ll _> sup II~bo Q(k,(xn),~l)ll ilwll=] ii/[l=],]lCx,L~l ii=l = sup
IIkJl=l,lE(x,,)n~, It=1
(xo) _ l)ll--II 'lI

Moreover,
243
~(k,(xn)n>_l) = ~ k,~-~Xnen
n=l
=
n=l
~(xnk, en)=CI)((xnk)n>l)
(46)
Finally, if ~bl satisfies (12) it coincides with ~; for
q)l((kn)~>-l)=@l( ~k~en)n=l
oo
=~CI)l(knen)=~lf2(en'kn)~=l
n=l
n=l
where we have used the fact that any where kiei = (ki6i,n)n>>_l,i= 1,2,....
(kn)n>1 C l1
can be represented as ~icX)=lkiei
[]
8.2. Proof of Proposition 6

We have, for a nonnegative (mi,j)ij>l E M, (2 - C) -1 (mij)i,j>l
e ;teQt eQt(mi,j)i,j>_ldt
OC ~ n
= foC'~e-(;'+k)teQt~~(mi,j)i,j>ldt
n=O
oc {~.jk.~n kk~ ) roe = Z I ~ I e (2+k)ttneQtI(mi,j)i,j>ldt

n=O
0(3
n.
Jo
= Z [ Z @ ( O k ) n ] [(A-}-N- Q)-I @f]n--l(mi,j)ij> 1

n=0 oo
(Qk) (mi,j)i,j>_l ,
and the change of the order of integration and summation is justified by the Lebesgue bounded convergence theorem; we namely have
eQt @ ~ ~
@ t ;
(mi,j)i,j>_l ~--
eQt @ ~ (Qkt) n ~ (mi,j)i,j>l
oo
= e ll(mi,j/i,j>, kl
The rest is obvious. []
244
8.3. P r o o f o f L e m m a 11
Let us represent m as a sum of its columns: m = ~ = _ o o m ~ e k , OO ek = (3.,k)~ez- We have Dm = ~ k OO = - . o D (ink ek) = ~k=-oo mk * ~., and
D[P(t) P(t)]m = D
k
(2<3
where
P(t)mk P(t)e~
= ~
k= oo o<)
D(P(t)mkP(t)ek)
= ~
k=-oo
V ( t ) m k * P(-~k
Taking into account (36),

A
P(t~k = e-~t(e~J~*ek) = e Vt(eUP')* ~ = e - % ui~t * so that

o(3
e-2Vte vpt * mk * e vT)t *

k=-oo k=-oo
= ~
k=-oo
e v(p+/)-z)t *mk *
= e v(p+k-2)t * ~
k=-oo
m k * ~ = e v(p+~ 2 ) . Dm
[]
8.4. P r o o f o f L e m m a 15
The p r o o f of the first part of the lemma being straightforward, we restrict ourselves to proving (43). N o t e that it is enough to show (43) for m = x y, where x , y E l 1. As a consequence of (41) we have
I/lexp(vpt_vt ) = exp{vt[@(z) - 1]} ,
(47)
(Feller, p. 428). Therefore, using (40) and (41),
~.texp(vp t v t ) * x ( 7 2 ) l ~ e x p ( v p t - v t ) * y ( ~ )
exp{vt[@@) - 1]} exp{vt[0p(~) = 1]}0~(~).(~)
= exp{vt[tpp(~) + 0p(~) - 2]}0xy(r, 4) , as desired []
A semigroup representation and asymptotic behavior 8.5. Proof of Proposition 16
245
The characteristic function of Z(t) is 0R(t)(z, 4). By (16), 0R(t)(r, 4) = exp
[_f
J0 2N(u)J 0r(t)Ro(r, 4)
+ fo 2N~(s) exp [ Js 2N(u)J Or(t_s)or(,>(~, 4)ds Using Eqs. (43), (42), (36), (41) and (47) 0r(t_,)op(4~(r, 4) = exp(v(t - s)[@(z) + 0p(4) - 2])0op(,>(z, 4) = exp{v(t- s)[@('c) + @(4) - 2]}0p(,>(z + 4) = exp{v(t - s)[@(z) + @(4) - 2]}
x exp{vs[Oo(z + 4) - 1]}0~(r + 4)
Therefore, the characteristic function of Z(t)/v~ equals exp [_ [ t du ] J0 2N(u)J exp{vt[p(~/v~) + (p(4/v/t) - 2]} x 0Ro(z/x/7, ~/v~) + 2-~(s) exp
E_;' 7 J~ 2N(u)J
(48)
x {exp{vt[q)(z/v/t) + (p(/v~) - 2]} exp{vs[cp(z/v~ + 4/v/t) - ~o(z/v~)
-(P(4/v~) + 1]}0~[(z + ~)/v~]}ds Since limh-~0 2[~o(h) - l l / h 2 = (p" (O),
(49) l i m t ~ vs[~o(v/v~) + (p(4/x/t) - 2] = 0 . Note also that the expression in square brackets under the integral in (48) is norm-bounded by 1, since it is equal to Or(t_s)op(~>(r/v/t, 4Ix~t). Therefore, reasoning similar as in Proposition 9, leads to the conclusion that as t ~ oe, (48) converges to
1 exp(-c~) exp [2 vP n (0)('c 2 +42)]
limt--+o~vt[9)(~/v~) + (p(4/v/t) - 2] = lv~o"(0)(r 2 + 42),
+ f 21(S)exp [- is' 2d(u)l exp [~ vep"(O)('c2 + 42)1 ds
246
as desired.
[]
References
Billingsley, P. (1986). Probability and Measure. Wiley, New York. Chakraborty, R. and M. Kimmel (1999). Statistics of microsatellite loci: Estimation of mutation rate and pattern of population expansion. In Microsatellites: Evolution and Applications (Eds., D. B. Goldstein and C. Schlotterer), Oxford University Press, Oxford, pp. 160 169. Chakraborty, R. and M. Nei (1982). Genetic differentiation of quantitative characters between populations of species: L Mutation and random genetic drift. Genet. Res. Camb. 39, 303-314. Chung, K. L. (1967). Markov Chains with Stationary Transition Probabilities, 2nd edn. Springer, Berlin. Defant, A. and K. Floret (1993). Tensor Norms and Operator Ideals. North Holland, Amsterdam. Deka, R., S. Guangyun, D. Smelser, Y. Zhong, M. Kimmel and R. Chakraborty (1999). Rate and directionality of mutations and effects of allele size constraints at anonymous, gene-associated, and disease-causing trinucleotide loci. Mol. Biol. Evol. 16, 1166~1177. Ewens, W. J. (1979). Mathematical Population Genetics. New York, Springer. Feller, W. (1966). Introduction to Probability and Its Applications, vol. II, Wiley, New York. Friedman, D. (1971). Markov Chains. Holden-Day, San Francisco. Gajic, Z. and M. T. J. Qureshi (1995). Lyapunov Matrix Equation in System Stability and Control Academic Press. Goldstein, D. B., A. R. Linares, M. W. Feldman and L. L. Cavalli-Sforza (1995). An evaluation of genetic distances for use with microsatellite loci. Genetics 139, 463-471. Griffiths, R. C. and S. Tavar6 (1994). Sampling theory for neutral alleles in a varying environment. Phil. Trans. R. Soc. London B 344, 403-410. Hille, E. and R. S. Phillips (1957). Functional Analysis and Semigroups. Amer. Math. Soc. Collloq. Publ. vol. 31. Providence, R.I. Jin, L. and R. Chakraborty (1995). Population substructure, stepwise mutations, heterozygote deficiency and their implications in DNA forensics. Heredity 74, 274-285. Kaplan, N. L., W. G. Hill and B. S. Weir (1995). Likelihood methods for locating disease genes in nonequilibrium populations. Am. J. Hum. Genet. 56, 18 32. Kimmel, M. and R. Chakraborty (1996). Measures of variation at DNA repeat loci under a general stepwise mutation model. Theor. Pop. Biol. 50, 345-367. Kimmel, M., R. Chakraborty, J. P. King, M. Bamshad, W. S. Watkins and L. B. Jorde (1998). Signatures of population expansion in microsatellite repeat data. Genetics 148, 1921 1930. Kingman, J. F. C. (1982a). On the genealogy of large populations. J. Appl. Prob. 19A, 27-43. Kingman, J. F. C. (1982b). The coalescent. Stochastic Processes and their Applications 13, 235~48. Lasota, A. and M. C. Mackey (1994). Chaos, Fractals andNoise: Stochastic Aspects of Dynamics, 2nd edn. Springer, New York. Li, W.-H. (1997). Molecular evolution. Sinauer, Boston. Moran, P. A. P. (1975). Wandering distributions and the electrophoretic profile. Theor. Popul. Biol. 8, 318-330.
247
Nagel, R. (1986). One parameter Semigroups of Positive Operators. Lect. Notes Math. vol. t 184. Springer, Berlin. Norris, J. R. (1997). Markov Chains. Cambridge Series on Statistical and Probabilistic Mathematics, Cambridge University Press, Cambridge. O'Brien, P. (1982). Allele frequencies in a multidimensional Wright-Fisher model with general mutation. J. Math. Biol. 15, 227~37. O'Brien, P. (1985). Homozygosity in a population of variable size and mutation rate. J. Math. Biol. 22, 279-291. Ohta, T. and M. Kimura (1973). A model of mutation appropriate to estimate the number of electrophoretically detectable alleles in a finite population. Genet. Res. 22, 201-204. Pankratz, V. S. (1998). Stochastic Models and Linkage Disequilibrium: Estimating the Recombination Coefficient. Doctoral dissertation (M. Kimmel, advisor). Department of Statistics, Rice University, Houston, TX. Pazy, A. (1983). Semigroups of Linear Operators and Applications to Partial Differential Equations. Springer, New York (Chapter 4). Polanski, A., M. Kimmel and R. Chakraborty (1998). Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data. Proc. Natl. Acad. Sci. USA 95, 5456 5461. Pritchard, J. K. and M. W. Feldman (1996). Statistics for microsatellite variation based on coalescence. Theor. Popul. Biol. 50, 325-344. Roe, A. (1992). Correlations and interactions in random walks and genetics. Ph.D. dissertation. University of London, London, UK. Semadeni, Z. (1965). Selected Topics on Functional Analysis and Categories. Aarhus Universitet, Matematisk Institut, Aarhus. Shriver, M. D., L. Jin, R. Chakraborty and E. Boerwinkle (1993). VNTR allele frequency distributions under the stepwise mutation model: A computer simulation approach. Genetics 134, 983-993. Slatkin, M. (1995). A measure of population subdivision based on microsatetlite allele frequencies. Genetics 139, 457-462. Tavar6, S. (1984). Line-of-descent and genealogical processes, and their applications in population genetics models. Theor. Pop. Biol. 26, 119 164. Tavar6, S. (1995). Calibrating the clock: Using stochastic processes to measure the rate of evolution. In Calculating the Secrets of Life (Eds., E. S. Lander and M. S. Waterman), pp. 1 lZ1~152,Washington, National Academy Press. Weber, J. L. and Wong C. (1993). Mutation of human short tandem repeats. Hum. Molec. Genet. 2, 1123-1128. Yosida, K. (1980). Functional Analysis. Springer, Berlin.
CJ
.J
Continuous-Time ARMA Processes
P. J. Brockwell
Continuous-time autoregressive (CAR) processes have been of interest to physicists and engineers for many years (see e.g., Fowler, 1936). Early papers dealing with the properties and statistical analysis of such processes, and of the more general continuous-time autoregressive moving average (CARMA) processes, include those of Doob (1944), Bartlett (1946), Phillips (1959) and Durbin (1961). In the last ten years there has been a resurgence of interest in continuous-time processes partly as a result of the very successful application of stochastic differential equation models to problems in finance, exemplified by the derivation of the Black-Scholes option-pricing formula and its generalizations (Hull and White, 1987). Numerous examples of econometric applications of continuoustime models are contained in the book of Bergstrom (1990). Continuous-time models have also been utilized very successfully for the modelling of irregularlyspaced data (Jones (1981, 1985), Jones and Ackerson (1990)). At the same time there has been an increasing realization that non-linear time series models provide much better representations of many empirically observed time series than linear models. The threshold ARMA models of Tong (1983, 1990), have been particularly successful in representing a wide variety of data sets, and the ARCH and GARCH models of Engle (1982) and Bollerslev (1986) respectively have had great success in the modelling of financial data. Continuous-time versions of ARCH and GARCH models have been developed by Nelson (1990). In this paper we discuss continuous-time ARMA models, their basic properties, their relationship with discrete-time ARMA models, inference based on observations made at discrete times and non-linear processes which include continuous-time analogues of Tong's threshold ARMA models.
1. Introduction
Discrete time series are often obtained by observing a continuous-time process at a discrete sequence of observation times. It is then natural, even though the observations are made at discrete times, to model the underlying process as a continuous-time series. Even if there is no underlying continuous-time process, it 249
250
P.J. BrockweH
may still be advantageous to model the data as observations of a continuous-time process at discrete times. The analysis of time series data observed at irregularly spaced times can be handled very conveniently via continuous-time models as pointed out by Jones (1981, 1985). This paper is concerned with properties of continuous-time ARMA processes, linear and non-linear, and with inference for such processes based on observations made at discrete times tl, t2,..., tn. Inference based on continuously observed data has been considered by Dzhaparidze (1970, 1971). The study of continuous-time models has received great impetus fi'om the very successful use of continuous-time models in theoretical finance, particularly with the work of Black, Scholes and Merton on the pricing of options. While the interest in continuous-time models has been growing, it has also become apparent that the standard linear time series models are not adequate to describe many financial (and other) time series. In order to account for the changing variability of observed financial data, Engle (1982) introduced the family of ARCH models which has now grown to include GARCH, EGARCH and numerous other extensions. Nelson (1990) developed continuous-time analogues of ARCH and GARCH models. An account of much of his work is contained in the book edited by Rossi (1996). Many other non-linear models have been proposed for the analysis of time series in order to account for phenomena which cannot be represented within the classical linear Gaussian framework. Some of the more popular and successful of these are the bilinear models (see Subba Rao and Gabr, 1984), the random coefficient autoregressions (see Nicholls and Quinn, 1982) and the threshold ARMA models (see Tong, 1983 and 1990). The non-linear continuous-time models which we fit to data in Section 8 were developed as continuous-time versions of Tong's discrete-time threshold ARMA processes, which have found considerable success in modelling data which cannot be well-represented by linear models. These models introduce non-linearity in a way which has the simple interpretation that the dynamics of the process is dependent on the level of the process at some previous time. In the continuoustime models considered here we do not include Tong's delay parameter so that the dynamics is dependent only on the current level of the process. In Section 2 we examine the definition and properties of continuous-time ARMACo, q) processes, deriving simple expressions for the autocovariance function which highlight the relationship between continuous-time and discretetime ARMA processes, discussed in Section 3. In Section 4 we discuss the problem of embedding a given discrete-time ARMA(p, q) process with q < p in a continuous-time ARMA process and the closely related aliasing problem which concerns the identifiability of a continuous-time ARMA process when it is observed only at discrete times. Section 5 deals with maximum Gaussian likelihood inference for CARMA processes observed at times { h , . . . , t,}. In Section 6 we introduce a general class of non-linear continuous-time AR processes, specializing in Section 7 to continuous-time threshold autoregressive (CTAR) processes. Continuous-time threshold ARMA (CTARMA) processes are also briefly discussed. Applications of CTAR models are given in Section 8.
Continuous-time A R M A processes
251
2. CARMA(p, q) processes We define a Gaussian CARMA(p, q) process {Y(t)} with 0 < q 0 , (2.1)
where D denotes differentiation with respect to t, {W(t)} is standard Brownian motion,

a(z) : = z p + a l z p-1 + . . . q- ap,
b(z) := bo + blZ + " " + bpz p , and the coefficients bj satisfy bq ~k 0 and bj = 0 for q < j _<p. Since the derivatives DYW(t) do not exist in the usual sense, we interpret (2.1) as being equivalent to the observation and state equations, Y(t) = b'X(t) , and dX(t) - AX(t)dt = e dW(t) , where ~ A = 0
--ap --ap 1
(2.2)
(2.3)
1 0
0 1 0
--@-2
... .-. .-"'"
0 0
e= , b= "
1
--a 1
Lb ,
and X(0) is assumed to be uncorrelated with {W(t)}. The state Eq. (2.3) is an Ito differential equation for X(t). In the casep = 1, A is defined to be - a l . Because of the linearity of (2.3), its solution can immediately be written down as
X(t) = eAtX(0) +
Jo"
eA(t-U)edW(u)
(2.4)
where the stochastic integral on the right-hand side has expected value zero and the property that if f and 9 are square integrable with respect to Lebesgue measure on [0, t], then
E[.fotf(u)dW(u) ft O(u)dW(u)l= .fotf(u)9(u)du.
(2.5)
The first p - 1 components of the state Eq. (2.3) simply define the first p - 1 components Xo(t),... ,Xp_2(t) of X(t) in terms of the last component Xp-1 (t) via the equations,
252
P. J. Brockwell
Xj(t) - Xj(0) = f0 t Xj+I (u)du,
j = 0 , . . . ,p - 2 .
Thus, for j = 1,... ,p - 1, Xj(t) is the j t h derivative of X0(t). The last component equation of (2.3) is
dXp_l + alXp-1 dt + . . . + apX0 dt = dW(t) , (2.6) showing that Xo(t) can be interpreted as a solution of (2.1) in the case when
b=[1 0 ... 0 0] .
The solution (2.2) for general b is suggested by the linearity of the Eq. (2.1). This motivates the following definition. DEFINITION 2.1. For integers p and q satisfying 0 _< q O} is a Gaussian CARMA(p, q) process with parameters al,..., ap, bo,..., bq, if and only if {Y(t)} satisfies (2.2), where {X(t)} is a stationary solution of (2.3). Since the solution (2.4) of (2.3) is valid for a more general class of processes {W(t)}, we can extend Definition 2.1 as follows. DEFINITION 2.2. {Y(t)} is a CARMA(p,q) process if it satisfies Definition 2.1 with {W(t)} any real-valued second-order random process satisfying W(0) = 0 and having uncorrelated zero-mean increments with E[(W(t)- W(s)) 2] = t - s, for all t > s > 0. DEFINITION 2.3. {Y(t)} is a C A R M A process with mean p if and only if {Y(t) - #} is a C A R M A process. It is not difficult to check that necessary and sufficient conditions for stationarity of the solution (2.4) are that the zeroes of a (which are also the eigenvalues of A) all have strictly negative real parts, and that the distribution of X(0) has mean vector, E[X(0)] = 0 and covariance matrix, (2.7)
z = <x(0)x(0)'] = foo
eaYee eAYdy .
1 i
(2.8)
Under these conditions, the mean and autocovariance function of the process {X(t)} are easily found (using (2.5)) to satisfy E[X(t)]=0, and t>_0 , (2.9)
E[X(t+h)X(t)'] = eAh]g, h > 0 .
(2.10)
From (2.2) the mean and autocovariance function of the CARMA(p,q) process {Y(t)} are therefore given by
253
ElY(t)] = 0 , and
t>0
(2.11)
yy(h) = E[Y(t + h)Y(t)] = b'eArhlb .
(2.12)
REMARK 1. If {W(t)} is standard Brownian motion and X(0) is Gaussian with mean zero and covariance matrix ~, then {Y(t)} as defined by (2.2) is Gaussian and strictly stationary. (The term "stationary" will be used throughout to denote weak stationarity.)
Although the expression (2.12) for the autocovariance function has an awkward appearance, it is possible to evaluate the matrices e Ah and quite explicitly in terms of the eigenvalues of the matrix A using its Jordan decomposition. The eigenvalues of A, as already indicated, are just the roots of the equation a(z) = 0 and the right eigenvector of A corresponding to the eigenvalue 2 is [ 1 2 -.. 2p-~ ]'. The Jordan decomposition was used by D o o b (1944) and, in a model-fitting context, by Jones (1981). From the resulting expression for the autocovariance function of the process {Y(t)} we find that the spectral density, fy(og) := ~ is 1 Ib(io9)l2 fY(o9) - 27Cla(io9)l2' -~ < co < oo . (2.13)
1F
O<3
e-imhyy(h) dh ,
(The expression (2.13) for the spectral density of {Y(t)} can be obtained very easily from the defining differential Eq. (2.1) if we proceed formally, writing the spectral density of the left-hand side as the power transfer function la(iog)] 2 times the spectral density of {Y(t)} and the spectral density of the right-hand side as the power transfer function Ib(iog)l2 times the 'spectral density' of {DW(t)}, taking the latter to be the improper density 1/(27c) for all co. The justification of these formal operations however requires substantial mathematical machinery. The state-space approach is more straightforward. In the purely autoregressive case (q -- 0) the solution (2.4) specifies not only the evolution of the process {Y(t)} but also of its p - 1 derivatives. For problems of inference the representation (2.4) allows very simple calculation of the Gaussian likelihood of observations of the process at discrete times tl,...,tn as described by Jones (1981). See Section 5.) A much simpler form of (2.12) can be derived from (2.13) by contour integration. Thus, substituting from (2. l 3) into the relation,
7y(h) =
OX3
e-~hfr(o9)do9
OO
254
P. J. Brockwell
and changing the variable of integration from co to z = ico, we find that
h_>0,
where S is the sum of residues of Ch[b(z)b(-z)]/[a(z)a(-z)] complex plane. This gives the general expression, YY(h)
=
in the left half of the
~:a(~))=0(m 1
-
[ d m-1 (z - 2)~clhtb(z)b(-z)_] 1)!
kd-
(2.14)
a(zM-z)
d =x '
where m is the multiplicity of the root 2 of a(z) = 0. In the case when the roots are distinct, Eq. (2.14) simplifies to
e;'thlb(2)b(-2)
7y(h) =
Z:a(Z)=0
(2.15)
a'(X)a(-2)
REMARK 2. For the process to be minimum phase the roots of b(z) = 0 must have real parts less than or equal to zero. (This corresponds to invertibility for discrete time A R M A processes.) For a one-to-one correspondence between the second order properties of {Y(t)} and the parameters a l , . . . , ap, bo,..., bq, it is necessary to restrict b(z) to satisfy the minimum phase condition and to be non-negative in a neighbourhood of z = 0. Every C A R M A process has such a representation, and in order to avoid trivial ambiguities, we shall identify C A R M A processes whose coefficients in this minimum phase representation are the same. EXAMPLE 1. The Gaussian CARMA(I,0) (or CAR(I)) process is the simplest continuous-time A R M A process. It is defined as a stationary solution of the firstorder stochastic differential equation,
(D + a)Y(t) = bDW(t),
t >_ O ,
where a > 0 and {W(t)} is standard Brownian motion. It is also known as the stationary Ornstein-Uhlenbeck process. From (2.11) and (2.15) we find at once that EY(t) = 0 and b2 7r(h) = ~a e-alhl (2.16)
EXAMPLE 2. The CARMA(2,1) process with parameters al,a2, bo, bl is a stationary solution of the stochastic differential equation,
D2y(t) +alDY(t) +a2Y(t) = (bo +blD)DW(t),
t>0 .
(2.17)
In order for such a solution to exist it is necessary that the roots of the equation, Z2 + a12 + a2 = 0 , (2.18)
Continuous-time ARMA processes
255
have negative real parts. To satisfy the requirements of Remark 2, we also require that b0 _> 0 and b] > 0. In the case when (2.18) has two distinct complex conjugate roots, 21=~+ifland22=c~-ifl, c~<0, f l > 0 , (2.19)
it follows at once from (2.15) that the autocovariance function of {Y(t)} is 7y(h) = 7y(0)e ~lh[[cos(ih ) + sin(fl[h]) c~(b~aa--b~)] L t(b a2 + b0 )J ' where 7g(0 ) _ b2a2 + b~ 2ala2 (2.21) (2.20)
Note that if 2 = e + iil is any complex number with non-zero imaginary part and if w = a + ib is any complex number, then K(h) = we;'lhl + ~e ~rhl, - o o < h < oo , (2.22)
is the autocovariance function of a CARMA(2,1) or CARMA(2,0) process if and only if c~ < 0 , a > 0 and (2.23) (2.24)
Jill <- a]el/I b]
(2.25)
Condition (2.24) expresses the obvious requirement that ~c(0)> 0 and a straightforward calculation shows that (2.25) is then necessary and sufficient for the Fourier transform
f(co) = ~ J-o~ e-ih~tc(h)dh
1 c~
to be non-negative for all co E ( - e c , oc). REMARK 3. A class of non-Gaussian CARMA processes which is particularly relevant to the modelling of heavy-tailed continuous-parameter time series is that of L~vy-driven CARMA processes, defined as follows. DEFINITION 2.4. If {W(t)} is a L6vy process and p and q are integers such that 0 < q 0} is a L6vy-driven CARMA(p,q) process with parameters al,..., ap, bo,..., bq, if and only if {Y(t)} satisfies (2.2), where {X(t)} is a strictly stationary process satisfying (2.4) and X(0) is independent of {W(t)}. (Such a process can be shown to exist and its finite-dimensional characteristic
256
P. J. Brockwell
functions determined, provided the zeros of a(z) all have negative real parts and E t W(t)I~ < ec for some r > 0.) L~vy processes have time-homogeneous independent increments and include Brownian motion, Poisson processes, stable processes, and many others. If the driving L6vy process has finite second moments then the correlation structure of the corresponding C A R M A process is the same as that of a Gaussian C A R M A process with the same coefficients. For discussions of L6vy-driven C A R M A processes, see Brockwell (2000, 2001) and, for CAR(l) processes driven by nonnegative L~vy processes, Barndorff-Nielsen and Shephard (1999). The latter paper contains a rich variety of applications to mathematical finance. The book of Protter (1991) contains an excellent account of L6vy processes, with emphasis on those aspects most relevant to stochastic integration. Strictly stationary nonlinear diffusion processes constitute another useful class of continuous-time models with non-Gaussian marginal distributions (see Ozaki (1985), Section 3.2). We shall consider only second order C A R M A processes in this chapter. 3. ARMA(p, q) processes Although our main concern in this paper is with C A R M A processes, we present here some basic second-order properties of discrete-time A R M A processes in order to make comparisons between the two. For a more detailed account of A R M A processes see e.g., Brockwell and Davis (1991). An A R M A ( p , q ) process {Yt} with autoregressive coefficients q51,...,~bp, moving average coefficients b l , . . . , bq, and white noise variance 0-2, is defined to be a stationary solution of the pth order linear difference equations,
O(B)Yt = O(B)Zt,
t = 0, t 1 , + 2 , . . .
(3.1)
where B is the backward shift operator (BYt = Yt-1 and BZr = Zt-1 for all t), {Zt} is a sequence of uncorrelated random variables with mean zero and variance a 2,
:= 1 q lz . . . . .
O(Z) : = 1 -~- O1Z @ " " " -~ OqZq ,
Oq 0 and q~p 0. We define ~b(z) := 1 i f p = 0 and O(z) := 1 if q = 0. We shall assume that the polynomial qS(z) = 1 - q51z. . . . . ~bpzP is non-zero for all complex z such that Iz[ < 1. This condition guarantees the existence of a unique stationary solution of (3.1) which is also causal, i.e., is expressible in the form co Yt = ~ j = 0 0 j Z t - j for some absolutely summable sequence {0j}. It is evident from this representation that the mean of the A R M A process defined by (3.1) is zero. The process {Yt} is said to be an ARMA(p,q) process with mean # if {Yt - #} is an ARMA(p, q) process. The properties of {Yt} are readily deduced from those of { Yt -/~} so we shall restrict attention to zero-mean case, The process {Yt, t - - 0 , 1,2,...} has a state-space representation analogous to (2.2) and (2.3) (see e.g., Brockwell and Davis (1991), p. 469), given by

Yt=0'Xt, and t=0,1,2,...
257 (3.2)
Xt+l-g~Xt=eZt+l,
where [~
q) = 1 0
t=0,1,2,...
(3.3)
...
0 0
Jr @r-1
l 0
~r-2
..-..
"'"
0
,e= 1
q~l
[;]
,0=
I Or--1
Or_2
r := max(p,q + 1), 00 := 1, Oj := 0 for j > q, ~bj := 0 for j > p and X0 = ~j~0 q~eZ_j. The stationary solution of (3.3) satisfies (cf. (2.4)),
t--1 0o
Xt = @tX0 + E @JeZt_/ = E q~/eZe-+, j=o j=o
t >_ 0 .
(3.4)
which, with (3.2), immediately yields the analogues of (2.9)-(2.12), namely, E[Xt]--0, t=-0,1,2,... , (3.5)
e[x,+hx',l = eh-=,
E[YtI=0, and
h > 0,
,
(3.6)
(3.7)
t=O,l,2,...
v~(h) := E[g+hrJ = O' erhl_= o ,

where
OG
(3.8)
-~ = E[XoX~] = o2 E @+ee'q~l/ " j=0
(3.9)
The similarity between the expressions (2.12) and (3.8) for the autocovariance functions of the continuous- and discrete-time processes is quite striking. As for the continuous-time process, the matrices q~h and N can be evaluated explicitly in terms of the roots of the equation q~(z-1) -- 0 (which are also the eigenvalues of q~) by using the Jordan decomposition of the matrix @. However the simplest form of the autocovariance function of {Yt} can be found directly from its spectral density. For the discrete-time A R M A process, there is no difficulty in computing the spectral density directly from the defining equation since the spectral density of the left-hand side of (3.1) at frequency co E [-~, rt] is Iqb(e -it) [2fy (co) where fy (co) is the spectral density of {Yt}. Similarly the spectral density of the right-hand side is 10(e-i')12a2/(2n ). Hence
258
P. J. Brockwell
a 2 0(ico)0(-ico) fY(co) = 2~ ~b(ico)(-ico) ' ~z < co < zc . (3.10)
The autocovariance function of {let) is obtained f r o m this expression using the relation, 7y(h) =
ekhfy(co)dco .
7[
(3.11)
Changing the variable of integration in (3.11) f r o m co to z = ico and evaluating the resulting contour integral gives 7y(h)
=
0"28,
h = 0, 1 , 2 , . . .
, of the
where S is the sum of residues unit disk. This gives the general expression,
ofzh-lO(z)O(z-1)/[~(z)O(z-1)] in the interior
_ 0"2sE0,q_~(Ih i) [ d~-~ ihl z~-~O(z)O(z-1)] (q-p-Ihl)! Ldzq-p-q< 4~(z)4~(z-~) ]z=o

+~ 2 Z (m - 1)! LT~--~
0"2 [dm-' (z- 2)mzA-IO(z)O(z-1)]

+(z)4)(z-~)
z=2
(3.12)
where I[0,q-p) is the indicator function of the interval [0, q - p), Z denotes the set of roots 2 of the equation +(z -1) = 0 , (3.13)
and m is the multiplicity of 2. In the case when q < p and the roots )~ are distinct, Eq. (3.13) simplifies to 7r(h)=-
0"2+@h]+10(~j)O(~/1 ) ~ ~ j=~ 4( j)~ (-~)
(3.14)
Notice the close resemblance between (3.14) and the continuous-time result, (2.15), which has the same general form with powers of the roots 23 replaced by exponential terms. EXAMPLE 3. As in Example 2, we can show that if 2 = re i0 is any complex n u m b e r with r > 0 and non-zero imaginary part and if w = a + ib is any complex number, then to(h) = w2 Ihl +~)T Ihl, h = 0,~1,... , (3.15)
is the autocovariance function of an A R M A ( 2 , 1 ) or AR(2) process if and only if r < 1 , a > 0 (3.16) (3.17)
259
and
Icos(0)l
<_ a(1 - r2)/(2rlbl)
(318)
Condition (3.17) expresses the obvious requirement that ~c(0)>0 and a straightforward calculation shows that (3.18) is then necessary and sufficient for the Fourier transform,
1 o~
f(co) = x -
e-ikK(h)
ATe k='~_oc
to be non-negative for all co c [-zc, ul-
4. E m b e d d i n g
and aliasing
Instead of representing the C A R M A ( p , q) process as a linear combination of the components of a continuous-time p-variate AR(1) process as in Section 2, it is useful also to represent it as one component of such a process This can be done by introducing a new state vector, Y(t) := BX(t), where {X(t)}, as before, is the stationary solution (2.4) of (2.3) and
1 B= 0
0 1
... ...
0 0 ifb00 .
..-
0 1
If b0 = 0 and i is the smallest integer such that bi ~ 0, then we replace the first component of the (i + 1)th row in the definition of B by 1. The CARMA(p, q) process {Y(t)} can then be represented as
Y(t)=[1
-..
0]Y(t),
t>0
(4.1)
where Y(t) is a stationary solution of the vector AR(1) equation, dY(t) = BAB-1Y(t)dt +Be dW(t), t_> 0 . (4.2)
It is clear from (4.2) that {Y(n),n = 0, 1,2,...} is a discrete-time vector AR(1) process satisfying Y(n) = e BAB ~ Y ( n - 1) +
eA("-~)Be dW(u),
n = 1,2,....
(4.3)
-1
Thus if {Y(t),t_>0} is a C A R M A ( p , q ) process with q < p , then {Y(n), n = 0, 1,2,...} is the first component of a discrete-time p-variate AR(1) process. It then follows from a result of D o o b (1944), that {Y(n)} is a discrete-time CARMA(p, q,) process with q' < p. This result is the basis for a method, due to Phillips (1959), for estimating the parameters of a C A R M A process which is
260
P. J. Brockwell
observed at discrete uniformly spaced times. The idea was further developed by Durbin (1961). It is well-known that non-parametric estimation of the spectral density of a continuous-time stationary time series suffers from the problem of aliasing, or the inability to distinguish at integer times between harmonics of different frequencies in ( - e e , ec). It was hoped, by using the parametric C A R M A model and the properties of the process observed at integer times, that the problem of aliasing could be avoided and unique estimates obtained of the coefficients of the underlying stochastic differential equation and of the spectral density of the corresponding continuous-time process. As we shall see in Example 4 below, this idea works sometimes but not always, depending on the particular CARMA process whose parameters are being estimated. A great deal of further work has been done on techniqes for circumventing the aliasing problem. See for example Robinson (1978), and Phillips (1973). The embedding problem is concerned with the question of whether or not there exists a CARMA process {Y(t)} whose autocovariance function at integer times coincides with that of a given ARMA(p, q) process with q < p. Since we already know that every CARMA autocovariance function when restricted to the integers is a discrete-time A R M A autocovariance function, this question is equivalent to the question, is the class of discrete-time A R M A autocovariance functions with q < p the same as the class of C A R M A autocovariance functions restricted to the integers? An affirmative answer to this question was given by He and Wang (1989), however, as pointed out by Brockwell and Brockwell (1998), this is not correct. In particular, if the discrete-time A R M A process {Yn} defined by (3.1) has a moving average unit root, i.e., if the equation O(z) = 0 has a root on the unit circle in the complex plane, then there is no continuous-time A R M A process whose autocovariance function restricted to the integers coincides with that
of {Y.}.
The problem of finding a simple characterization of the discrete-time A R M A processes which are embeddable remains open. We know from (3.14) that if q < p and if the roots 21,... ,2p of the equation qb(z-1) = 0 are all distinct, then the autocovariance function of the A R M A process {Y,} defined by (3.1) is
p
7(h) = Z
j--I
~J2~ ~1 '
(4.4)
where
= -o-2. j
0(2j)0(zi 1) 1) , J = 1,... ,p ,
(4.5)
and qS' denotes the derivative of the autoregressive polynomial qS. Comparing the expression (4.4) with the autocovariance function (2.15), we see that to embed the process {Yn}, we need to find a C A R M A process whose autocovariance function {6(x)} is
261
(5(x) = where m is an integer,
~
k=-m j=l
wjkc~jejklxl ,
(4.6)
Ojk = In 2j + 2k~i ,
i = v -y,
- ~ < a r g ( l n 2 j ) < ~z and the (possibly complex-valued) weights coj~
satisfy ~wjk= k=-m 1, j=l,... ,p .
The problem thus reduces to finding an integer m and a set of weights such that (4.6) is an autocovariance function on the real line, or equivalently such that the function, f(co) = ~ . / 1 foo
oo
~(x)e_iCOXdx= _ 1 ~
~-~ wjk~
Ojk
7/; ~ ~ k=-m j=l
J(D2 02 ~ jk
is non-negative for all ~o E (-oo, ec). EXAMPLE 4. The causal AR(1) process defined by the equations,
X n = ( g X n _ 1 "j- Z n ,
{Zn} ~ WN(0, a 2)
with I~1 < l, has autocovariance function, o-2q~lhl

x(h) 1 -
Inspection of (2.16) shows that if 0 < ~ < 1 then {X,} can be embedded in the CAR(l) process satisfying
DY(t) -
(in c~)Y(t) = bDW(t) ,
where b 2 = - 2 1 2 ( 1 -q52) I lnO. If on the other hand - 1 < q~ < 0 then ln~b is complex and we need to find a suitable set of weights wjk as in (4.6). We have p = 1, ~1 1/(1-q~2), 0j0 = - l n 2 1 = l n ] ~ b l - i n and 0jl = l n k b [ + i n . If we choose m = 1 in (4.6) with Wl0 = wll = 0.5 and wl,-1 = 0, then the expression (4.6) is an autocovariance function on (-o% oo). The latter assertion can be verified by computing the Fourier transform
=
f(co) = ~
Oo
5(x)e-i~Xdx ,
(4.7)
which is found to be non-negative for all co C (-oc, ~ ) . The expression (4.7) is moreover easily identified (see (2.13)) as the spectral density of the CARMA(2,1) process satisfying
262
P. J. Brockwell
D2Y(t)
2(ln I I)DY(t) + ((ln I l) 2 +
= boDW(t) + blD2W(t) ,
where b~ = -2o-2(ln kbl)/(l - (b2) and b 2 = b~((ln I~bl)2 + re2). The embedding problem for the AR(1) process, together with some partial results for the general case, was treated by Chan and Tong (1987). The embedding problem is intimately connected with the problem of identification of a C A R M A process from observations at integer (or uniformly spaced) times. If the process {Y(t), t = 0, +1, 2 , . . . } is a discrete-time Gaussian A R M A process which can be embedded in more than one Gaussian C A R M A process, then it will not be possible to distinguish between these C A R M A processes based only on observations of Y(t) at integer times. (Examples of AR(2) processes which can be embedded in both CARMA(2,1) and CARMA(4,2) processes are given by Brockwell (1995).) This is precisely the aliasing problem discussed earlier in this section. The following example demonstrates that the autocovariance function of a CARMA(p, q) at integer lags may determine a unique minimum phase Gaussian C A R M A process with autoregressive order p. On the other hand, depending on the parameters of the process, there may be one or two or any finite number of C A R M A processes with autoregressive order p and the same autocovariance function at integer lags. There may also be infinitely many such processes. Whether or not there is an aliasing problem thus depends on the process. EXAMPLE 5. Consider the autocovariance function (2.22) where w = a + ib and 2 = c~+ i 3 satisfy conditions (2.23), (2.24) and (2.25). We know from Example 2 that this is the autocovariance function of a CARMA(p, q) process with p = 2. One stochastic differential equation whose stationary solution has the autocovariance function (2.22) is (D - 2)(D - 2)Y(t) = (bo + blD)DW(t) , where bo : 21,&l~[ac~ + b31 and bl = 2 [ x / ~ f i - ac~[ . These coefficients are obtained by computing the spectral density of the autocovariance function (2.22), f(co) = 2[(bfi - ace)co 2 - (aa + b/3)]0l2] rc[(io9 + 0)(ico + 0)12 ,
- o o < co < cx~ ,
and comparing it with the general form (2.13). We can specify different C A R M A processes with p = 2 having exactly the same autocovariance function at integer lags by adding integer multiples of 2~ to /~, provided the constraint (2.24) is satisfied. This determines the total number of such C A R M A processes. In fact the number of different stochastic differential equations (keeping in mind Remark 2 of Section 2) generating the same autocovariance function at integer lags is just the number of integers k such that
263
+ 2k l <_ la /bl
(4.8)
This is either [[ac~/bzl] or 1 + [lae/brc[], where Ix] denotes the integer part of x. Thus if lac~/fi] < ~z, there is at most one C A R M A with p = 2 and autocovariance (2.22) at integer lags, while if b = 0, there are infinitely many.
5. Inference for linear CARMA processes

Suppose we observe a Gaussian CARMA(p, q) process at times q, t2,..., which are not necessarily uniformly spaced. Then from (2.2) and (2.4) the state vectors and observations X(ti) and Y(ti), at times q, t2,..., satisfy the discrete-time state and observation equations, X(ti+l) = eA(t~+~-t~)x(ti) + Z(ti), i= 1,2,... , (5.1) (5.2)
Y(ti)=[bo
bl
""
bp]X(ti),
i= 1,2,... ,
where {Z(ti), i = 1,2,...} is an independent sequence of Gaussian random vectors with mean, E[Z(ti)] = 0 and covariance matrices,
E[Z(ti)Z(ti)'] =
~o
ti+l --ti
eAYee'eA'Ydy .
These equations are in precisely the form needed for application of the Kalman recursions (see e.g., Brockwell and Davis (1991), Chapter 12). From these recursions we can easily compute m i = E ( Y ( t i ) ] Y ( t j ) , j 2, and hence the likelihood,
N
L = (2~z)-N/2(Vl... VN) -1/2 exp --
(Y(ti) - mi)2/(2vi)
(5.3)
where ml = 0, vx = b'b and is defined by (2.8). A non-linear optimization algorithm can then be used in conjunction with the expression for L to find maximum likelihood estimates of the parameters. The calculations of eAt and the integrals are most readily performed using the Jordan decomposition of the matrix A discussed in Section 2. This is the method used by Jones (1981) for maximum Gaussian likelihood fitting of CAR processes with possibly irregularly spaced data and by Jones and Ackerson (1990) in their analysis of longitudinal data using C A R M A processes. Even if the C A R M A process is not Gaussian, exactly the same procedure will yield "maximum Gaussian likelihood" estimates of the parameters of the process. The Kalman recursions can also be used to compute minimum mean-squared error predictors based on the fitted model (see e.g., Brockwell and Davis (1991)). EXAMPLE 6. The maximum likelihood CAR(2) model for the sunspot numbers, 1770-1869 [see e.g., Brockwell and Davis (1991), p. 6], is found to be
264
P. ~ BrockweH
O2y(t) -~- 0.495DY(t) + 0.430Y(t)
= 24.66DW(t) + 20.81 ,
(5.4)
with AIC = 827.6. This is a smaller value than for all discrete time AR(p) models with p _< 20 (but not as small as for the minimum A I C discrete time subset A R model). Later we shall fit a continuous time threshold AR(2) model to this data and compare it with the linear model. The question arises here as to whether or not there is any other C A R M A ( p , q) model with p = 2 and the same autocovariance function at integer lags. The answer to this question is no, since from (5.4) and (2.15) we find that the autocovariance of the CAR(2) process defined by (5.4) is 7r(x) = welhl + we olhl , where w = 470.31 + 22.25i = e + i / ~ and 0 = -0.2475 + 0.6072i = a + ib . Since there is no integer k except 0 for which
fl + 2k~ <_ ]ac~/b[ ,

we conclude from (4.8) that there are no other C A R M A ( p , q) models with p = 2 and the same autocovariance function at integer lags as (5.4).
6. Nonlinear CAR models The linear C A R ( p ) process with coefficients a l , . . . , a p and b was defined in Section 2. The C A R ( p ) process with mean # can be defined by adding # to the zero-mean process. Alternatively it can be defined as a stationary solution of the stochastic differential equation,
(D p + aiD p-1 + . . . + ap)Y(t) = bDW(t) + aplL ,
(6.1)
which has the state space representation (a slight modification of (2.2) and (2.3)),
r(t) = [1 0 ... 0]Y(t),
t>0
(6.2)
where {Y(t)} is the unique stationary solution of dY(t) = (AY(t) + ap#e)dt + be dW(t) , (6.3)
A and e are defined as in (2.3) and all the eigenvalues of A are assumed to have negative real parts. A natural family of non-linear C A R processes is obtained by restricting { W(t)} to be standard Brownian motion (as we shall from now on) and allowing the
265
parameters a l , . . . , ap and # in (6.1) to depend on Y(t). In particular if we partition the real line into subintervals, ( - o c , yl], (Yl, y2], .., (ym, oc), on each of which the parameter values are constant, then we obtain a continuous-time analogue of the threshold models of Tong (1983) which we shall refer to as a CTAR(p) process. The Eq. (6.1) with parameters depending on Y(t) has the same state-space representation (6.2) and (6.3) as in the linear case, except for the dependence of a l , . . . , ap and # on Y(t). We shall assume that b is non-zero and independent of Y(t), although, as we shall observe later, there are two important cases, namely the threshold AR(1) and AR(2) processes, for which the latter assumption can be relaxed. We shall also assume that the functions al,.. , ap and # are bounded and measurable on R. We shall drop the stationarity assumption which we imposed in the linear case, although conditions under which there is a stationary solution of (6.3) are of course still of interest and have been considered, for CTAR(1) processes by Stramer et al. (1996(a)) and for a more general class of continuous-time threshold A R M A processes by Stramer et al. (1996(b)). The definition of a non-linear CAR process in terms of (6.2) and (6.3) immediately raises the question of whether or not a solution exists and if so whether or not it is unique. By the fundamental theorem on existence of strong solutions of stochastic differential equations (see e.g., Oksendal, Theorem 5.2.1 (1998)), Eq. (6.3) has a unique strong solution, for given Y(0), if the coefficients of dt and dW(t) satisfy standard Lipschitz and growth conditions. The Lipschitz conditions will not be satisfied for threshold models because of discontinuities in at least one of the functions a l , . . . , ap and #. In this case we must look for weak solutions of the state Eq. (6.3). With the aid of the Cameron-Martin-Girsanov formula, we can construct a weak solution of (6.3) which is valid whether or not there are discontinuities in a l , . . . , a p and # and which gives a characterization of the transition function of the Markov process Y(t). This is done as follows. We adopt a slight modification of the state-space representation (6.2) and (6.3) by introducing a new state vector X = b-lY, whose components we shall denote by X/, i = 0 , . . . ,p - 1. Then in place of (6.2) and (6.3) we have
Y(t) = bXo(t) ,
where dX0 = X1 (t)dt,
(6.4)
dX1 = X2(t)dt,
:
d Y p _ 2 = Xp_
(6.5)
l(t)dt,
alXp_l(t) + ap#b-1]dt + dW(t) ,
dXp 1 : [-apX0(t) . . . . .
and we have abbreviated a i ( Y ( t ) ) : ai(bXo(t)) and #(Y(t))= #(bXo(t)) to ai and # respectively. We aim now to show that (6.5) with initial condition
266
P. J. Brockwell
X(0) = x = Ix0 xl " " Xp-1 ]' has a unique (in law) weak solution {X(t)} and to determine the distribution of X(t) given that X ( 0 ) = x. These transition distributions determine in particular the joint distribution of the process {Y(t)} at times t l , . . . , tN given X(0). Assuming that X(0) = x, we can write X(t) in terms of {Xp 1(s), 0 < s < t} using the relations, Xp_2(t) = Xp_2 + foXp 1(s)ds,..., Xo(t) = xo + foX1 (s)ds. The resulting functional relationship will be denoted by X(t) = F(Xp_~, t) . (6.6)
Substituting from (6.6) into the last equation in (6.5), we see that it can be written in the form,
dXp_l = G(Xp 1, t)dt + dW(t) ,
(6.7)
where G(Xp 1,t), like F(Xp_l,t), depends o n {/]@_I(S): 0 < S < t}. Now let B be standard Brownian motion (with B(0) =xp 1) defined on the probability space (C[0, ec), N[0, oc), Pxp_l) and let Yt = a{B(s),s <_ t} V S#, where . J is the sigma-algebra of Px~_~-null sets of .~[0, oc). The equations dZ0 = Z1 dr, dZ1 = Z2 dr, : dZp_2 = Zp_l dt, dZp_l = dB(t) , with Z(0) = x--- Ix0 Xl ... Xp-1]', clearly have the unique strong solution, Z(t) = F(B, t), where F is defined as in (6.6). Let G be the functional appearing in (6.7) and suppose that/}" is the Ito integral defined by W(0) = Xp 1 and dW(t) = - G ( B , t)dt + dB(t) = -G(Zp_I, t)dt + dZp 1(/) . If we define the new measure b~ ~ on (C[0, oc), N[0, oc)) satisfying (6.9) (6.8)
dPx~_~ = M(B,t)dPxp ~ ,
where
(6.10)
m(R,t) = exp[- ftC2(B,s)ds + ftG(B,s)dW(s)] ,
(6.11)
then by the C a m e r o ~ M a r t i ~ G i r s a n o v formula (see e.g., Oksendal (1998), p. 152), W is standard Brownian motion under/3x~ ~. Hence from (6.9) we see that (Zp(t), (V(t)) on (C[0, oc),~[0, oc),/5~ ~, {~-t}) is a weak solution of Eq. (6.7) with initial condition Xp_l(0) = Xp-1 and (Z(t), W(t)) is a weak solution of (6.5) with initial condition X(0) = x. Moreover, by Proposition 5.3.10 of Karatzas and Shreve (1991), the weak solution is unique in law, and by Theorem 10.2.2 of
Continuous-timeARMA processes
267
Stroock and Varadhan (1979) it is non-explosive. The characteristic function of the weak solution X(t) of (6.5) with X(0) = x is therefore given by q~x(t)(0lx) = / ~ p l
[exp(iO'Z( t) )] = Exp_l[exp(iO'Z( t) )M(B, t)] = Exp_~[exp(i0'F(B, t))M(B, t)] ,
(6.12)
where/~xp_~ and Exp_, denote expectation relative t o / 5 and Pxp_1 respectively, F is defined as in (6.6) and M as in (6.11). The importance of Eq. (6.12) is that it gives the conditional characteristic function of the state-vector (and hence of the process, Y(t) = bXo(t)) as an expectation of a functional of the standard Brownian motion {B(t)} starting at Xp_l. 7. CTAR(p) processes Although the general non-linear autoregressions defined in Section 6 constitute a large class of continuous-time, non-linear and not necessarily stationary processes, it is convenient for the modelling of time series data to deal with a finiteparameter family. One such family, suggested by the success of discrete-time threshold models (see Tong, 1983 and 1990) is the class of continuous time threshold autoregressive (or CTAR) processes introduced in the second paragraph of Section 6. EXAMPLE 7. Applying the results of Section 6 to the CTAR(1) process defined by dY(t)
dY(t) +
+ alY(t)dt = bdW(t), a2Y(t)dt = bdW(t),
if if
Y(t) < O, Y(t) >>_ 0 ,
with b > 0, we can write Y(t) = b X(t), where dX(t) + a(X(t))dt = dW(t), and a(x) = alx if x < 0 and a(x) = a2x if x > 0. Thus, in the notation of Section 6, F(X, t) = X(t) and G(X, t) = -a(X(t)). From (6.12) we immediately find that the characteristic function of X(t) given X(0) = x is
O(OIx) = Ex [exp (iOB(t) - ~ fot a2(B(s))ds-
f0 t a(B(s))dB(s))
. (7.1)
This expression completely determines the conditional distribution of X(t) given X(0) = x (and hence of Y(t) given I7(0) = y). Analytical simplification of the right-hand side is difficult, but for any given value of 0 it can be evaluated approximately by simulation of standard Brownian motion starting at x. The conditional moments can be evaluated analogously, e.g.
E[X (t)Ix] = Ex [B(t)exp ( - ~ fot a2(B(s))ds- fot a(B(s))dB(s))l
" (7.2)
268
P. J. Brockwell
This result suggests computing the conditional mean E[X(t)Ix] by simulating independent realizations of the Brownian motion {B(t)}, evaluating the functional on the right-hand side of (6.14) for each realization and then averaging the results to get a consistent estimator ofE[X(t)Ix]. The same method clearly generalizes to the calculation of second and higher order conditional moments of X(t) given X(0) for non-linear CAR(p) processes. REMARK 4. It can be shown, using random time transformations, that the restriction to constant functions b(Y(t)) can be relaxed for the CTAR(1) model (Strainer et al., 1996a) and the CTAR(2) model (Brockwell and Williams, 1997) with a single threshold at level r. It seems likely that the condition can also be relaxed more generally but no proof has yet been given. An alternative approximate method for computing conditional moments of the state-vector of a CTAR(p) process (Brockwell and Hyndman, 1992) uses the sequence of Euler approximations, defined by the observation and state equations, cf. (6.2) and (6.3), Yn(t)=[1 and Yn(t+!) = 0 ... 0]Y~(t), t=0,1,2,... , (7.3)
[I +!A(Yn(t))]Y,(t) + [~nZ(t)b(Y~(t)) +!c(Yn(t))]e ,

(7.4)
where
P(Z =
c=apt~ and {Z(t)} is an iid sequence of random variables with 1) = P(Z = -1) = 0.5. The process {Yn(t)} in (7.4) is clearly Markovian and the conditional expectations,
m,(y, t) : E(Y,(t)IY,(0 ) = y) , satisfy the backward Kolmogorov equations, m,(y, t + n 1) = mn(y + n 1(A(y)y + +lm,(y+n
c(y))e + n-1/2b(y)e, t) l(A(y)y + c(y))e- n-1/2b(y)e,t) ,
(7.5)
with the initial condition, m,,(y, 0) = y . (7.6)
These equations clearly determine the moments ran(y, t) uniquely. Higher order moments satisfy the same Eq. (7.5), and a modification of the initial condition (7.6) in which the right-hand side is appropriately replaced. For example
s.(y) :-- EfYo(t)Y.(t)'lY.(0)
y]
satisfies (7.5) with the initial condition,
269
s.(y, 0) = rs'
Brockwell and Williams (1997) showed in the case p = 2 that if we extend the definition of the process {y(n)} so that it is defined for all real t _> 0 by y(n) (t) = Y(n)([nt]/n), where [nt] is the integer part of t, then {Y(~)} converges in distribution as n -+ oc to a solution of (6.3), (Convergence in distribution here means weak convergence of the associated probability measures on Dr_~[0, oe).) For particular cases of CTAR(1) and CTAR(2) processes with a single threshold Brockwell and Stramer (1995) found close agreement between the calculation of moments based on simulation of Brownian motion (cf. (7.2)) and the calculation of moments based on (7.3) and (7.4) with n = 10. The adequacy of approximation can be checked by increasing the value of n but for n larger than 10, the exact solution of (7.5) becomes prohibitive and instead we rely on simulation of the process Y(n) itself. This method seems preferable to the one based on simulation of Brownian motion as the variance of the functional whose expectation is to be computed is frequently large. Sufficient conditions for geometric ergodicity of the CTAR(1) process with a single threshold and for the CTAR(p) process with single threshold and constant b are given by Stramer et al., (1996a) and (1996b) respectively. For the CTAR(1) process defined by dY(t) + a( Y(t) )Y(t)dt = b(Y(t) )dW(t) + c(Y(t) )dt , where (7.7)
(a(y), b(y), c(y) ) =

these conditions reduce to
( (a(1), b(1), c(1)) if y < r, k (a(2) b(2), C(2)) if y_> r ,
lim [a(x)x 2 - 2c(x)x] > 0 , Ixl~ and the stationary distribution then has the density,
(7.8)
(x) = kb 2(x) exp{-b - 2 (X)[a(x)x 2
--
2c(x)x]} ,
(7.9)
where k is the uniquely determined constant such that f-~o~rc(x)dx = 1. For the CTAR(2) process it suffices for all the eigenvalues of the two matrices A (1) and A(2) to have negative real parts, where A0) and A (2) are respectively the values of the coefficient matrix A in the defining Eq. (6.3) when Y(t) is below and above the threshold. REMARK 5. Many extensions of the CTAR(p) process are of potential interest. One can for example define a continuous-time threshold ARMA(p, q) process by allowing the matrices B and A in the state-space representation (4.1) and (4.2) to depend on Y(t) appropriately (see Brockwell, 1994). Properties of such processes are however less understood and they do not have the same direct relationship to
270
P. ~ BrockweH
a univariate stochastic differential equation of the form (2.1) with coefficients depending on Y(t) as does the CTAR(p) process. Other extensions would allow the thresholds to depend in a more general way on the state vector and allow the observations to be vector-valued.
8. Inference for CTAR processes
Continuous-time threshold autoregressive models with a single threshold at an unknown level r have been fitted to a variety of data sets (see e.g., Brockwell and Hyndman (1992), Brockwell (1994), Brockwell and Strainer (1995)) by maximization of the Gaussian likelihood, defined as the likelihood of the observations Y(h),..., y(tn), calculated under the assumption that the transition function of the state-vector is Gaussian. For the CTAR(1) process this is particularly straightforward since the observed process itself is Markovian. The Gaussian likelihood is the product of the Gaussian density go(y(tl)) having the same first and second order moments as the stationary distribution (7.9) and the conditional Gaussian density of Y = (Y(t2),..., Y(t,)) given Y ( h ) = y(tl), i.e.
n
fo(yly(h)) = H{[2~v(ti)]-l/eexp-[(y(ti)-m(ti))z/(2v(ti))]}
i=2
(8.1)
where 0 is the vector of 7 parameters in the model,
DY(t) a(1)Y(t) = bO)DW(t) + c (1), DY(t) -I- a(2)y(t) = b(2)DW(t) + c (2), m(tj) = E[Y(tj)IY(tj 1)
and
~(tj) = <(r(ti) - ,n(tg)2rr(tj-1)
=
Y(t) < r ,
(8.2)
r(t) _> r ,
(8.3)
y(tj-1)]
= y(tj-1)]
The one-step predictors and their mean squared errors are computed as described in Section 7 and the maximization is carried out with the aid of a numerical nonlinear optimization algorithm. The case p > 1 is more complicated since the observed process, unlike the state process, is not Markovian. However in the state-space representation (6.2) and (6.3) we can write Y(t) = [Y(t)]
LV(t) J
'
where V(t) is the (p - 1) x 1 vector consisting of the last p - 1 components of Y(t). Let fr denote the joint probability density of the random variables
271
Y(tr),V(tr), Y(tr-1), Y(tr-2),..., Y(tl). From the Markov property of {Y(t)} it is

easy to check that
fr+l(Yr+l,Vr+l,Yr,Yr--1,... ,Yl) = fp(yr+l,Vr+l,tr+l -trlyr, vr)fr(yr,vr,yr--1,... ,yl)dvr ,

(8.4)
where p(yr+l,Vr+l,tr+l--tr]Yr,Vr) is the density of (Y(t~+l),V(tr+l)')', given Y(t~) = (yr, V~)'. For a given set of observed values yl,... ,YN, at times t l , . . . , t, the functions f 2 , . . . , f x are functions of v2,... ,VN respectively. These functions can easily be computed recursively from (8.4) in terms of fl and the functions p(yr+l,',tr+l--trlYr,'). The likelihood of the observations Yl,...,Y~, is then clearly
L(O;yl,...,yN) = ~N fN(v,)dvn .
(8.5)
The filtered value of the unobserved vector V(t~), r = 1 , . . . , N , (i.e., the conditional expectation of V(tr) given Y(ti)= yi, i = 1,...,r) is readily obtained from the function fr as v~
_
f v/r(v)dv ff~(v)dv
(8.6)
On the other hand, the calculation of the expected value of Y(t~+l) given Y(ti) = Y i, i = 1 , . . . , r involves a much more complicated higher dimensional multiple integration. An alternative natural predictor of Y(t~+l) which is easy to compute can be found from -' ' tr+l -- tr) , Yr+l = m((yr, Vr), (8.7)
where m((y~, V~)', t~+l -t~) is approximated by mn((yr, Vr)', tr+l -t~) as defined in Section 7. If we take fl (Y, v) to be the Dirac delta function assigning mass one to (Yl, 0')', then the likelihood in (8.5) is the density of Y(t2),...,Y(tn), conditional on Y(tl) = yl and V(tl) = 0. The first and second order moments of the transition density P(Y~+I,v~+l, tr+l - t~]y~, vr) are found using the approximating process defined by (7.3) and (7.4). The 'Gaussian likelihood' is then found by replacing the transition densities by Gaussian densities with the same first and second order moments. The integrals in the recursions (8.4) are replaced by approximating Riemann sums. EXAMPLE 8. Following Tong and Yeung (1991), who fitted a different continuous-time non-linear model, Brockwell and Hyndman (1992) fitted a CTAR(1) process with one threshold to the series of relative daily price changes, X(t) = 100(P(t) - P(t - 1))/P(t - 1), of the IBM closing stock prices, May 18, 1961-March 30, 1962 (listed in Tong (1990), p. 512). If we fit a CTAR(1) model to
272
P. J. Brockwell
the first 200 observations by using the Euler approximation (7.3) and (7.4) with n = 20 to compute the moments of the transition distribution required in the maximization of the Gaussian likelihood, we obtain the model defined as in (8.2) and (8.3) with r = - 0 . 6 6 , a (1) = 2.67, b (1) = 2.79, c (1) = - 0 . 8 4 , a (2) = 0.40, b (2) = 1.23 and c (2) = -0.53. The value of -21n(GL) for this model is 539.4 as compared with 550.59 for the m a x i m u m likelihood C A R ( l ) model,
DY(t) + 1.50Y(t) = 1.70DW(t) + 0.135 ,

a significant reduction (in terms of AIC) for the addition of four extra parameters. In order to assess the performance of the non-linear model we can use it to compute one-step predictors of the next 30 observations in the series. The empirical mean squared error obtained from the linear model is 0.837 as compared with .863 for the non-linear model. At first sight this is a disappointing result, however if we look at the empirical mean squared prediction errors for Y(t) given ]Y(t - 1)l < 0.5 we find that the linear model gives 0.582 as compared with 0.524 for the non-linear model. This result is not surprising if we compute the model one-step mean square prediction error for the fitted CTAR(1) process. Simulation of the Euler approximation with n = 20 shows that its one-step mean squared error for predicting Y(t) has a minimum value of 0.832 when Y ( t - 1) is approximately -0.25, increasing to 1.044 at Y ( t - 1 ) = 3 and 0.910 at Y(t - 1) = - 3 . The linear C A R ( l ) model on the other hand has a one-step mean squared which is constant and equal to 0.915. Thus, although the model provides no benefit as far as overall one-step mean squared prediction error is concerned, it does capture, via the dependence of the one-step mean squared error on Y(t), the changing volatility of the process. This is the essential reason for the superiority of the fit over that of the linear model. EXAMPLE 9. As discussed in Example 6, the annual sunspot numbers, 1770 1869, are rather well fitted by the linear CAR(2) model, (D 2 + 0.495D + 0.430)Y(t) = 24.66DW(t) + 20.81 , for which -21n(GL) =- 812.8, where GL is the Gaussian likelihood of the last 99 observations, conditional on the initial state (101,0), 101 being the first obervation. In order to see what improvement can be obtained by introduction of a threshold, a CTAR(2) model was fitted to the series using the method described in this Section. Moments of the transition distribution for the state-vector were computed using the Euler approximation with n = 10. (Higher values of n were found to give essentially the same result). The model so obtained is (D 2 + 8.74D 0.33)Y(t) = 43.3DW(t) 31.6, (D 2 + 0.55D + 0.46)Y(t) = 28.4DW(t) + 23.0,
Y(t) < 10.0, Y(t) > 10.0 ,
with - 2 In(GL) = 797.0. Comparing this value with 812.8, we see that we have achieved a considerable improvement in likelihood at the expense of the five additional parameters. Computing the empirical mean squared error of the one-
273
step predictors for the next 20 annual sunspot numbers as given by Tong (1990), we find that the threshold model gives mean squared error 450.9, as compared with 469.8 for the linear CAR(2) model. Moreover the stationary distribution of the threshold model, computed by simulation of one million observations of the process gives a much better approximation to the highly non-Gaussian marginal histogram of the sunspot data (see Brockwell and Hyndman, 1992). Figure 1 shows the joint stationary density of the components of the state-vector (X(t),DX(t)) computed in the same way. The stationary density of the state vector of a CTAR(2) process does not, like that of the CTAR(1) process (see (7.9)), have a known explicit analytic form. EXAMPLE 10. The Australian All Ordinaries Index is an average of share prices on the Australian Stock Exchange. Its closing values, {P(t), t = 0 , . . . , 520}, recorded daily for 521 trading days ending July 18th, 1994, were studied by Brockwell and Williams (1997). If continuous-time AR(p) models (with p _< 10) are fitted to the series of percentage relative price changes,
U(t) = l O 0 ( P ( t ) - P ( t - 1))/P(t- 1) ,
it is found that the model with smallest AIC value is the AR(2) process,
D2U(t) + 2.94DU(t) + 5.06U(t) = 4.48DW(t) + 0.245 .

The one-step mean square prediction error for this model (conditional on infinitely many past observations of the process) is 0.652, a value only marginally smaller than the sample variance, 0.674. The benefit of predicting from the fitted
d
o
"--,..
""" ..................
;
:
22z::=-:
]
t ly N
e ,o
Fig. 1. Stationary joint density of the components (X(t),DX(t)) of the sunspot numbers and their derivative evaluated at (x,y), based on the threshold model of Example 9.
274
P. ~ BrockweH
model over simply predicting the next value to be the sample mean is therefore small (as one would expect from the efficient market hypothesis, which suggests that {U(t)} should be an approximately uncorrelated sequence.). In an attempt to reduce the mean squared error of prediction, a CTAR(2) model was fitted to {U(t)}, as in Example 9, by maximizing the conditional Gaussian likelihood. The model found (using the Euler approximation with n = 200 to compute the moments of the transition distribution of the state vector) was
D2U(t) 4- 0.78DU(t) 4, 5.01U(t) = 3.68DW(t) 4, 0.015, U(t) < -0.55 , D2U(t) 4. 3.23DU(t) 4. 4.06U(t) = 3.88DW(t)- 0.185, U(t) > -0.55 .
(8.8)
(8,9)
with - 2 ln(GL) = 1231.7, as compared with the (conditional) value 1251.5 for the linear CAR(2) model. The reduction in the value o f - 2 ln(GL) (at the cost of only 5 additional parameters) suggests a substantial improvement in the goodness of fit of the model to the 520 observed values of U(t). Under this model, the mean squared one-step prediction error of U(t) given U ( t - 1) and D U ( t - 1) can be computed by simulation and it suggests (see Brockwell and Williams (1997)) that prediction of U(t) will be most accurate for large positive values of U(t - 1). To check this suggestion, the fitted CTAR(2) model was used to generate one-step forecasts of the observed values of U ( 5 2 1 ) , . . . , U(550), and the corresponding prediction errors were recorded by subtracting the forecasts from the observed values. The resulting squared errors were then averaged to produce empirical mean squared errors of prediction. The results are shown in Table 1. The first row of the table shows the improvement in - 2 ln(GL) (conditional on initial state vector X ( 1 ) = ( u ( 1 ) , 0 ) ' = (0.812,0)') achieved by the CTAR(2) model relative to the CAR(2) model for the first 520 observations. In view of this, it is at first sight disappointing to see that the observed mean squared error when the two models are used to produce out of sample forecasts for the last 30 data points is actually worse by 3.2% for the non-linear model. However, the value of the non-linear model becomes apparent in the last two rows of Table 1 which show the conditional empirical mean squared errors for prediction of U(t 4, 1) given U(t) > 0 and U(t) > 0.2 respectively, The non-linear model shows an 11.3% Table 1 Linear m o d e l -2 In(GL) Observed MSE MSE given U(t) > 0.0 MSE given U(t) > 0.2 1251.5 0.417 0.378 0.205 Threshold model 1231.7 0.431 0.335 0.177
275
improvement in mean squared error when attention is restricted to prediction when U(t) > 0 and a 13.6% improvement when attention is restricted to prediction when U(t) > 0.2. These empirical observations highlight, as in Example 8, the way in which the non-linear model, unlike a linear model, captures the dependence of the prediction mean squared errors on the past history of the process. Acknowledgement
This work was partially supported by NSF Grant DMS 9972015.

References
Barndorff-Nielsen, O. E. and N. Shephard (1999). Non-Gaussian OU based models and some of their uses in financial economics, Working Paper in Economics, Nuffield College, Oxford. Bartlett, M. S. (1946). On the theoretical specification and sampling properties of autocorrelated time series. J. Roy. Statist. Soc. (Supplement) 7, 27~41. Bergstrom, A. R. (1990). Continuous Time Econometric Modelling. Oxford University Press, Oxford. Bollerslev, T. (1986). Generalised autoregressive conditional heteroscedasticity. & Econom. 51, 307 327, Brockwell, A. E. and P. J. Brockwell (1998). A class of non-embeddable ARMA processes. J. Time Ser. Anal. to appear. Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods, 2nd edition. SpringerVerlag, New York. Brockwell, P. J. and R. J. Hyndman (1992). On continuous time threshold autoregression. Int. J. Forecast. 8, 157-173. Brockwell, P. J. (1994). On continuous time threshold ARMA processes. J, Statist, Plann. Inf. 39, 291-304. Brockwell, P. J. and O. Stramer (1995). On the convergence of a class of continuous time threshold ARMA processes. Ann. Inst. Statist. Math. 47, 1-20. Brockwell, P. J. (1995). A note on the embedding of discrete-time ARMA processes. J. Time Ser. Anal. 16, 451-460. Broekwell, P. J. and R. J. Williams (1997). On the existence and application of continuous-time threshold autoregressions of order two. Adv. Appl. Prob. 29, 205-227. Brockwell, P. J. (2000). Heavy-tailed and non-linear continuous-time ARMA models for financial time series. In Statistics and Finance: An Intelface (Eds., W. S. Chan, W. K. Li and H. Tong), Imperial College Press, London. Brockwell, P. J. (2001). L6vy-driven CARMA processes. Ann. Inst. Stat. Math. 52, to appear. Chan, K. S. and H. Tong (1987). A note on embedding a discrete parameter ARMA model in a continuous parameter ARMA model. J. Time Ser. Anal. 8, 277-281. Doob, J. L. (t944). The elementary Gaussian processes. Ann. Math. Statist. 25, 229-282. Durbin, J. (1961). Efficient fitting of linear models for continuous stationary time series from discrete data. Bull. Int. Statist. Inst. 38, 273-281. Dzhaparidze, K. O. (1970). On the estimation of the spectral parameters of a stationary Ganssian process with rational spectral density. Th. Prob. Appl. 15, 531 538. Dzhaparidze, K. O. (1971), On methods for obtaining asymptotically efficient spectral parameter estimates for a stationary Gaussian process with rational spectral density. Th. Prob. Appl. 16, 550-554. Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of U.K. inflation. Econometrica 50, 987 1008.
276
P. J. Brockwell
Fowler, R. H. (1936). Statistical Mechanics. Cambridge University Press, Cambridge. He, S. W. and J. Wang (1989). On embedding a discrete-parameter ARMA model in a continuousparameter ARMA model. J. Time Ser. Anal. 10, 315-323. Hull, J. and A. White (1987). The pricing of assets on options with stochastic volatilities. J. of Finance 42, 281-300. Jones, R. H. (1981), Fitting a continuous time autoregression to discrete data. In Applied Time Series Analysis H (Ed., D. F. Findley), pp. 651-682. Academic Press, New York. Jones, R. H. (1985). Time series analysis with unequally spaced data. In Time Series in the Time Domain, Handbook of Statistics 5 (Eds., E. J. Hannah, P. R. Krishnaiah and M. M. Rao), pp. 157178, North Holland, Amsterdam. Jones, R. H. and L. M. Ackerson (1990). Serial correlation in unequally spaced longitudinal data. Biometrika 77, 721-732. Karatzas, Ioannis and Steven E. Shreve (1991). Brownian Motion and Stochastic Calculus. SpringerVerlag, New York. Nelson, D. (1990). ARCH models as diffusion approximations. J. Econom. 45, 7-38. Nicholls, D. F. and B. G. Quinn (1982) Random Coefficient Autoregressive Models: An Introduction. Springer Lecture Notes in Statistics 11. Springer-Verlag, New York. Oksendal, B. (1998). Stochastic Differential Equations, 5th ed. Springer-Verlag, Berlin. Ozaki, T. (1985). Non-linear Time Series Models and Dynamical Systems. In Time Series in the Time Domain, Handbook of Statistics 5 (Eds., E. J. Hannan, P. R. Krishnaiah and M. M. Rao), (North Holland, Amsterdam), 2~84. Phillips, A. W. (1959). The estimation of parameters in systems of stochastic differential equations. Biometrika 46, 67-76. Phillips, P. C. B. (1973). The problem of identification in finite parameter continuous time modeIs. J. Econom. 1, 351 362. Protter, P. (1991). Stochastic Integration and Differential Equations. Springer-Verlag, New York. Robinson, P. M. (1978). Continuous model fitting from discrete data. In Directions in Time Series (Eds., D. R. Brillinger and G. C. Tiao), Institute of Mathematical Statistics. Rossi, E. P. (ed.) (1996). Modelling Stock Market Volatility. Bridging the Gap to Continuous Time. Academic Press, San Diego. Strainer, O., P. J. Brockwell and R. L. Tweedie (1996a). Continuous time threshoId AR(1) models. Adv. Appl. Prob. 28, 728-746. Strainer, O., R. L. Tweedie and P. J. Brockwell (1996b). Existence and stability of continuous-time threshold ARMA processes. Statistica Sinica 6, 715 732. Stroock, D. W. and S. R. S. Varadhan (1979). Multidimensional Diffusion Processes. Springer-Verlag, Berlin. Subba Rao, T. and M. M. Gabr (1984). An Introduction to Bispectral Analysis and Bilinear Time Series Models. Springer Lecture Notes in Statistics 24. Springer-Verlag, New York. Tong, H. (1983). Threshold Models in Non-linear Time Series Analysis, Springer Lecture Notes in Statistics 21. Springer-Verlag, New York. Tong, H. (1990). Non-linear Time Series: A Dynamical System Approach. Clarendon Press, Oxford. Tong, H. and I. Yeung (1991). Threshold autoregressive modelling in continuous time. Statist. Sin. 1, 411-430.
l [-~ lk ~J
Record Sequences and their Applications
John Bunge and Charles M. Goldie
1. Introduction
The probabilistic theory of extremes has been growing in impact in recent years and a state-of-the-art account can be found elsewhere in this H a n d b o o k . One corner of the field, the area known as outstanding observations or record sequences, or just records, has developed in a distinctive way and is the topic of this article. Suppose there is a sequence of independent identically distributed (i.i.d.) random variables X1, X2 . . . . . These for instance could be the observations of one variable in an experiment that can be repeated as often as needed. Or they could be a stream of data values at reasonably long time intervals from a physical system that fluctuates, does not evolve, and has little dependence on past behavior. (It could be that one postulates that the system is like that in order to test whether it is indeed so.) Knowing extreme-value theory, one might form the sequence of successive maxima Mn := max(X1 ,X2,... ,X,,). After the first few values, Mn does not change very often. There will be long stretches where the graph of Mn against n stays flat. In fact, as we shall see below, the number of times Mn changes value between n = 100 and n = 1000, or, worse, between n = 1000 and n = 10 000, is not likely to differ much from the number of times it changes between n = 10 and n = 100. In the circumstances, it looks like a good idea to throw away all the repeats and just consider the values taken by the M~ sequence. Call these R1, R2, R3, .... So R1 := X1, R2 is the first of Xa, 2 3 , . . . to exceed R1, and so on. These are the record values. Their distribution theory, both here-and-now and asymptotically, is quite different from that of the extreme values M1, M2, ..., even though the R~ sequence is just the Mn sequence without repeats. The change in sampling frame, so to speak, alters everything. There is some information left in the M, sequence aside from the record values, namely the times at which its successive values are first taken. These are the record times L1, L2, L3, . ... So L1 : = 1, L2 is the first n such that X, > X1, L3 is the first n > L2 such that X, > XL2, and so on. One sees that Rk = XLk (=MLk). A beautiful account of record sequences was given in Glick (1978). It is as worth reading now as then, and in what follows we are not going to rewrite that article. Instead, we want to concentrate on clarifications of the basic structure of
277
278
J. Bunge and C. M. Goldie
records that were found mainly in the ten years following Glick's article, and on developments and applications that have continued right up to now. Ours is far from being the only view of the subject, and there are quite different accounts in the surveys of Nevzorov (1987), Nagaraja (1988) and Ahsanullah (1995). Substantial textbook treatments are in Resnick (1987) and Pfeifer (1989), and there are shorter accounts in Galambos (1987), Arnold et al. (1992), Resnick (1992) and Embrechts et al. (1997). Everything contained in this discussion applies equally well to minima, hence lower records, just as much as to (upper) records. It is enough to apply results about upper records to the observations -)(1, -X2, ..., whose upper record values are the lower record values of X1, X2, .... The above framework of record values and times arising from i.i.d, observations is what one might call the classical record model, which we develop in Sections 2-6 below. There are several ways to extend the classical model. One is to consider records from various sorts of changing populations. The models of interest are application-oriented and we discuss them in Section 7 together with certain extensions to the classical model and certain connections and applications in other fields, mainly combinatorics. The second main extension, which is the one that has progressed the furthest, is to let the original observations occur at random times. We devote our final five sections to these 'random record models'.
2. Partial records
As well as considering the set

~1 = {RI,R2,R3,...}
(2.1)
of observation values that are the largest yet seen when they appear, one can think about the set N2 of observation values that are the second largest on their appearance, and the set N3 of observations that are the third largest on their appearance, and so on. We call ~@2 the set of 2-records or 2-record values, N3 the set of 3-records, etc., and refer to them collectively as the sets of partial records. With this notation we can state what in our view is the most surprising fact about record values, as follows. THEOREM 2.1. (Ignatov's Theorem) NI, "~2, ~-@3,..- are independent and identically distributed random sets. We will say something about the name and origins of this theorem in the next section, after devoting this section to elucidating its content. First note that it is the sets that are independent and identically distributed; the contents of, say, ~1, listed as in (2.1), are a strictly increasing sequence of r.v.s (random variables). The theorem says that the sets ~2 and N1 are independent of each other, and the joint distribution of the 2-records is the same as that of the 1-records R1 < R2 < R3 < " "; next, ~3 is independent o f ~ l and ~2 together, and the joint distribution of the 3-records is the same as that of the 1-records, and so on.
R e c o r d sequences and their applications
279
Before going further, we ought to make precise the inequalities that implicitly are in use to define the records, 2-records and so on. We need to specify the inequalities carefully because, for good reasons as it turns out, we must allow for different observations having coincident values. Denote the number of elements of a set A by (CA. The initial rank of our nth observation Xn is defined to be p, := 4P{i: i E { l , . . . , n } , X ~ _>Xn} . (2.2)
So Pl -- 1, while for general n Pn is a random integer between 1 and n inclusive. If one plots the observations as points (n,Xn) in a plane, p, is the number of points in the 'northwest corner' subtended by (n,X,), edges included. In Figure 1, for instance, Pl = 1, P2 = 2 and P3 = 1, while because X1 = )(4 = X5 we have P4 = 3 and P5 = 4. Notice that ifXm = X~ for some m n then Pm and p, must be unequal. So for any chosen initial rank there can be no two coincident observations. Therefore we may unambiguously define
02k : = { Y . : p~ = k } .
Theorem 2.1 holds in complete generality for the partial records thus defined. It need not be the case, however, that 021, or other 02k, are infinite sets. To see this, let F be the common distribution function (d.f.) of the X / a n d let x+ (-cx~, ~ ] be its 'top': x+ := sup{x: F(x) < 1} . If x+ is finite and P(X1 = x+) = 1 - F(x+ - 0) > 0 then sooner or later there will be an observation with value x+, and so 021 is a finite set with greatest element x+. The same holds for 022,023, .... In all other cases it can be shown that each 02x is an infinite set that can be listed as a strictly increasing sequence that tends to x+, as in (2.1). The 'surprise' element in Ignatov's Theorem arises when one realizes that the constraints on the (n + 1)st observation, if it is to be a 2-record, very much depend on the record values, as they happen to stand at time n. In a sense the theorem works because time has been wholly written out of its formulation. It looks only at the traces of the partial record values.
Xn
I I I
]
I
-O m ---
---o-- ~ 4 n
Fig. 1. Initial r a n k s w h e n o b s e r v a t i o n values can coincide.
280
For illustration we display the 1-, 2-, 3- and 4-records from some actual data which may be assumed i.i.d. We take from Glick (1978), Data Set 1, the rainfall amounts in Vancouver in January, in inches for the years 1906-1966 inclusive. The partial records are shown in Figure 2. Were later observations included, more values might be added above the marks shown, but the traces of the 1- to 4-records up to level 13.28 are complete. If the original observations are indeed i.i.d., these are realizations of i.i.d, point processes (random sets). The above illustration brings out the fact, known from theory, that even in quite long sequences of observations there are unlikely to be enough actual records to use for worthwhile statistical inference. By considering 1-, 2-, . . . , m-records, where m is chosen so that the expected number of partial records is as large as required, this problem of lack of data is overcome. The importance of Ignatov's Theorem is that models for the record values, which we will come to below, apply also to the partial record values and may thus be tested by inference based on all of them.
3. Record values
In this section we shall characterize the random set ~1 of record values. Ignatov's Theorem 2. l carries this characterization over to Nk, the set of k-record values, for each k. Recall that we let F denote the common d.f. of the observations X~. It is convenient to extend F to set arguments, as in
2019 18 17 1615 14 13-
124
I
114 10=
s"
72 Fig. 2. The 1-, 2-, 3- and 4-record values for Vancouver rainfall.
Record sequences and their applications F(x, ~ ) = 1 - F(x) = P(X1 > x), F(a, b] = F(b) - F(a) = P(a < Xa < b), F{x} = F(x) - F(x - O) = P(X1 = x) ,
281
and so on. Let D be the set of points where F is discontinuous; then F{x} = P(X1 = x) is positive precisely when x c D. The simplest characterization of ~1 is when F is discrete, so the Xi take values in the finite or countable set D. Suppose that each point x c D is owned by an individual demon, who decides with probability px := F{x}/F[x, oo) that x shall be an element of ~1, and with probability 1 - p x := F(x, oo)/F[x, ~ ) that it shall not. The demons do not communicate with each other, so their decisions are mutually independent. This describes ~1 completely (as in this case it must obviously be a subset of D). Before leaving the discrete case it is worth convincing oneself that, for any interval (a, b1,
P ( ~ I N (a, b] = O) = F(b, cxD)/F(a, cxD) .
(3.1)
F does not change except at points of D, so for instance if (a, b] contains just the points xl < x2 < x3 of D then
F(x,, oc) F(x2, oc) F(x3, oc) P ( ~ I A (a, b] = 0) = Fix1, ~ ) FIx2, ~ ) Fix3, ~ ) _ FIx2, oo) F(x2, oo) F(b, oo) _ F(b, oo) -- F(a, ~x~) Fix2, oc) F(x2, oo) F(a, ~ ) '
with a similar telescoping for more or fewer points of D. The argument extends even to cases when D is very thickly spread - it can be for instance be the set of rationals. The other case where ~1 is quite simple to characterize is when F is continuous, so D is empty. Here (3.1) is used as definition rather than conclusion. In fact, define a measure t / o n ~ by its 'distribution function' q ( - c c , x ] := - l n F ( x , oo) (x E g~) , (3.2)
and insist that for any (measurable) set A q R, P ( ~ I Q A = 0) = e ~(A) (3.3)
In this, (3.1) is the special case when A = (a, b]. We call t/the avoidance measure for N1, because through (3.3) it specifies the probability that N1 avoids any particular set. Because tl, being a measure, is countably additive, if we fix disjoint sets A1, A2, ... then (3.3) implies that the events that N1 has no elements in common with A1, A2, ... are independent of each other (just as for the discrete case because of the independence of the demons). This suggests ~1 might be a Poisson process, and indeed in this case it is. When F is continuous, N1 is the random set of points of an (inhomogeneous) Poisson process on N of intensity
282
Y. Bunge and C. M. Goldie
measure t/. So ~1 puts independent numbers of elements in disjoint subsets of N, and the number of elements of N~ in any set A is Poisson-distributed with parameter q(A). The general case is nothing more than the two special cases put together. Thus define r/by (3.2) and observe that the points x where t/(-oo,x] is discontinuous, i.e., the locations of the 'atoms' of the measure r/, are precisely the points where F is discontinuous, that is, they are the points of the set D. The continuous part of is the measure t/C given by
.c(A)
:= .(A)
xEDf3A
,{x}
on sets A with t/(A) < ec, and by
qc(A) := limqc(An )
otherwise, where An T A and t/(An) < oc for all n. THEOREM 3.1. (Shorrock's Theorem) -~1 consists of 1) elements of D provided by independent demons: the demon who owns x E D decides with probability 1 - e -~{x} that it shall be an element of N1, and with probability e-,ff~} that it shall not; 2) the points of a Poisson process, independent of the demons, of intensity ~c. It is immediate from this theorem that ~1 retains the independent increments property of each of its two components: ~ has independent numbers of elements in disjoint subsets of N .
The construction of ~ l given by the above theorem is explicit, but in case it seems rather complicated we will give two concise characterizations which are equivalent to the construction. The first is as follows. THEOREM 3.2. The distribution of ~1 is the unique distribution of a random set such that (3.3) holds for all finite unions of intervals A, where the avoidance measure r/is determined by (3.2). (By a 'random set' we mean, technically, a simple point process, i.e., one without multiple points; cf. Daley and Vere-Jones (1988).) The above characterization brings out that even in the general case, r/as determined by (3.2) still gives the avoidance probabilities for ~1 via (3.3). Indeed, (3.3) holds for all Borel sets A, though the smaller class of sets mentioned in the theorem suffices for the characterization. The second characterization is in terms of the function N1, where N1 (x) is defined to be the number of elements of ~ in the set (-oc,x]. By a 'counting function' we mean a right-continuous function f : N ---, N which is flat except for jumps of height exactly + 1, and for which limx+ oof ( x ) = O.
Record sequences and their applications
283
THEOREM 3.3. D e f i n e H : E --+ ~ by
H(x) :=
~((
O(3~X]
dF(y) F[y, oc)
(x E R) .
(3.4)
The distribution of N1 is the unique distribution of a random counting function such that M := Nl - H is a martingale. (If x+ < oc, read (3.4) as saying that H is constant in the interval Ix+, ec).) For continuous-parameter martingales see e.g., Rogers and Williams (1994). In martingale terminology N1 has a deterministic compensator which is given by H. In a terminology borrowed from reliability theory, H defined by (3.4) is the (cumulative) hazard function of F. If F has density f then (3.4) implies that H(x) = fx_~ h(u)du where h(x) := f (x) / (1 - F(x) ), that is, h is the hazard rate o f F . H(x) is the expected number of elements of ~1 in the set ( - e c , x ] , and this suggests, on letting H denote also the measure determined by the function H, that H(A) = fA dH(x) is the expected number of elements of ~ l in the set A. This is indeed the case, so H is the intensity measure of the point process whose points are the elements of-~1. What is the relation between the intensity measure H and the avoidance measure q of (3.2) and (3.3)? Now H, like ~/, has atoms sited at the points of the set D, and a remaining continuous part Hc. In terms of these, for any Borel set ACR,
H(A) = qc(A) + Z
xEDnA
(1 - e -~{x}) ,
(3.5) (3.6)
q(A) = Hc(A) + Z
(-ln(1 - H{x})) .
xEDTLd
So t/~ and Hc are the same as each other, and hence when F is continuous the measures tl and H are the same as each other. From (3.3) and (3.6) we get
P(~I A = O) = e-He(A) H
xcDfM
(1 - H{x}) .
(3.7)
For each n E N, let (Eln))j_l,..., , _ be a partition of ~ into n disjoint intervals. Suppose that as n --+ ~ the maximum length of interval in the nth partition tends to O. Then it turns out that for sets A bounded away from x+,
n
P(.~I NA = 0) = l i m U ( 1
- H(E~~) NA)) .
(3.8)
In fact, it is not hard to show that the right-hand side of (3.8) reduces to that of (3.7). The limit procedure in (3.8) is rather like defining a Riemann integral, and, sure enough, the right-hand sides of (3.7, 3.8) are alternative definitions of the product integral I ~ ( 1 - d H ) . Product integrals in this form have only recently
284
been revived and codified (see Gill and Johansen (1990)). They lead to a nice extension of (3.7, 3.8) for any measurable ~b : N --+ I0, 1), E I-I (1 - qb(x)) = H ( 1 - ~bdH) ,
xc~l
(3.9)
the right-hand side being a product integral while inside the expectation on the left-hand side there is just an infinite product. We can rewrite the product integral as a D-indexed product times the exponential of an ordinary integral, similarly to (3.7), and deduce an inelegant formula for the Laplace functional of 0~1 (cf. Goldie and Rogers (1984), (2.3)). Taking A in (3.7) to be (-oe, x], the left-hand side becomes 1 - F ( x ) , so we have a formula inverting (3.4). When F is continuous this becomes simply 1 - F(x) = e -H(x), inverting H(x) = - ln(1 - g(x)). We close this section with history. The identification of N1, Theorem 3.1, is due to Shorrock (1972, 1974). The characterization in Theorem 3.2 relies on a standard result that simple point processes are characterized by their avoidance probabilities on a 'dissecting ring' of sets (see Daley and Vere-Jones, 1988, Theorem 7.3.II), and the characterization in Theorem 3.3 is from Goldie and Rogers (1984). In view of the different contributions to N1 made by the discrete and continuous parts of F it is not surprising that Ignatov's Theorem 2.1 took some while to reach its present formulation. The case with F continuous, when the ~k are i.i.d. Poisson processes, was formulated independently in Deheuvels (1983) and Ignatov (1978). Various different proofs appeared (see Mathematical Reviews 87j:60074 for a list). The general case of the result was given in Goldie and Rogers (1984). Finally, W. Vervaat realized that the case of F discrete (so only the Bernoulli 'demons' are present) has an easier proof than other cases, and this insight together with the (subtle) deduction of the general case from the discrete case appeared in Engelen et al. (1988). The proof of the general case, thus given, with a minor compression by Rogers (1989), remains the shortest known. Subsequently, two further proofs have appeared in Samuels (1992) and Yao (1997), possessing aspects of interest.
4.
Sojourns
In Section 1 we defined the record times but time has been rigorously excluded from the subsequent two sections. Here we will bring it back into the picture. The first task, though, is to fix a positive integer m and arrange the elements of the sets ~1, . . . , ~m in one list in increasing order, obtaining a sequence which we denote Q(m), with elements Q~m):
~Q(m)~ Q(m) :_____v J ] j = l , 2 , . . .
~t~(m) tl(m)
\k31 ' ~2
''"
')
'
where Q5m) _ < ~j+l r~(m) for all j. Here we are notating Q(m) as an infinite sequence, which is the case so long as F has no atom at the supremum of its support. In the
285
contrary case, when F has an a t o m at the supremum of its support, i.e., x+ < oc and F(x+) < 1, Q(m) is with probability 1 a finite sequence and minor notational changes are needed. Note that the sequence Q(m) can have repeated values, for if the same value appears in several of the ~ i then it goes in the list as m a n y times as it appears. However if F is continuous Q(m) has no repeats and is the sequence of points of a Poisson process on R of intensity mr/, where the avoidance measure t/ comes from (3.2). In the general case Q(m) is a little more complicated but its distribution is known from Ignatov's and Shorrock's Theorems 2.1 and 3.1, and one can still do calculations with it without real difficulty. Denote the observations X1,...,Am rearranged in increasing order by X~ (~) _< ... _< X~ (2) <_ X~ (1). These are the order statistics and we want to follow the progress of the mth order statistic X(~m) as the sample size n increases through values m, m + 1 , . . . . The values then taken by X(~m) are precisely the elements of the sequence Q(m), in order but with more repeats. In fact, the signal for the ruth order statistic to take a step through the sequence Q(m) is when an observation of initial rank at most m appears. These times are, in terms of the initial ranks Pn defined in (2.2),
LI m)
:= m,
(4.1)
Lj(m) 4-1 : =
inf{n : n > ~j L(m) , p, _< m}
(j=1,2,...)
where we make the convention inf0 := +oc. The ruth order statistic X~ (m) has value Q!m) for all n satisfying LJ. m) <_n < L~+]. The time that the ruth order statistic spends at the value Q~), its 'sojourn' in that value, is thus
@m) := ,(~) -j+l _ L~m)
(j = 1 , 2 , . . . ) .
(4.2)
In Figure 3 we plot the 3rd order statistic (m = 3) for the data used for Figure 2. The observations having initial rank at most 3 are plotted as (n,Xn). Each such Xn eventually becomes a value taken by the third order statistic, with a sojourn that is shown as a horizontal line segment at the same height as X~. There are no coincident values here, so to illustrate this effect we use in Figure 4 the precipitation data for Vancouver in July in reverse time order (1966 back to 1906) and again display the 3rd order statistic. The times n = 1 through to n = 61 correspond to years 1966 down to 1906. One can see that the two values of 3.36, which occurred at times n = 1 and n = 25, give rise to end-to-end sojourns from n = 25 to n = 35 and from n = 35 to n = 51, at the same height. The point of all the definitions above is to state the following. THEOREM 4.1. The sojourns AIm), A~m), ... are conditionally independent, given Q(m), with geometric distributions
P(AJ m) = IIQ (m)) = (F(Q~m)))l-l(1 - F(Qm)))
(l = 1,2,...)
(4.3)
286
201918171615Xn
141312111098-
.j
1;
1'5
20
2'5
3;
n
3;
4'0
4'5
5;
5'5
6;
Fig. 3. The 3rd order statistic for Vancouver rainfall.
5 4 X,~ 3 2 1
.i
p(A
I0
1'5
20
25
3'0 n
3'5
40
4'5
50
5 ~ 5
60
Fig. 4. Vancouver in July in reverse time: 3rd order statistic. What this means is that one can evaluate the probability of any event involving A (m), conditionally on the whole sequence Q(m). For instance the _j
l > k.A(ml ,~j+l > l]Q (m)) = (F(Q)(m))) k (F(Q)+I)) l .
(4.4)
We do not have to condition on the whole sequence Q(m), for we can throw away from the conditioning anything that is not involved in the right-hand side ('redundant conditioning'). Thus P(A~ m) > k, A(m) n(m) ,~j+IJ n(m) also evaluates j+l > l ~j as the right-hand side of (4.4). With the information in Theorems 2.1 and 3.1 one thus has the joint distribution, for m fixed, of the @") and the ~jO(m),for all j= 1,2,.... Recall from Section 3 that the r a n d o m set ~1 has independent numbers of elements in any chosen disjoint sets. By Ignatov's Theorem 2.1, the same holds for every ~k and their independence makes the sequence Q(m) inherit an analogous property. Theorem 4.1 then implies that the sojourns of the mth order statistic in any chosen disjoint sets are independent random variables. So for instance the
287
number of n for which 2 < X~m) < 3 is independent of the number of n for which 3 <_X(~m) < 4. What we are really talking about here is the inverse of the mth order-statistic process. So let us define x(m)~-(') to be a version of the inverse:
x(m)~--(x) : = inf{n : n > m, X (m) _> x}
(x _< x+) ,
This defines x(m)'-(.) to have left-continuous paths, ensuring the convenient relationship X (m)~(x)_<n iff x<_X (m) .
It is natural to consider increments ofX(m)+-(.) of the form
x(m)+-[x,y) :=x(m)+-(y) - x ( m ) ~ ( x )
(x < y ~ x+) ,
as X(m)~[x,y) is then the amount of time the mth order statistic spends in the interval Ix,Y): = # { n : x.(") e Ix,y)} . We may define x(m)~-I for other forms of interval I by taking right-hand limits. Thus for instance
X-(m)~{x} = x(m)+- (x-}- ) - x(m)~ (x) = { r t : X (m) = x} .
THEOREM 4.2. Fix m C N. The process X (m>- has independent increments: for any disjoint intervals 11, . . . , Ik in (-oc,x+],
P(x(m)+-I1 = h i , . . . ~x(rn)~Ik ~- nk) k = 1-I P(x(m)+--IJ = nj) ( n l , . . . , n k E No) ,

j=l
where N0 := {0, 1,2,...}, and P(X(m>[x,y) =n) kFEx,~)) , =

m
for n = 0 , (4.5)
\rEx,~l)
k=l
:)F(x,y)kF(-oc,y) n-k,
forn= 1,2,....
The notation m A n stands for min(m, n). We emphasize that this Theorem holds for all F, without continuity assumptions. In view of the complicated final formula it is useful to understand how it can be directly derived. PROOF of (4.5). For n = 0, notice that x(m)~[x,y) = 0 if and only if all of the first m of the Xi that are at least x are also at least y. Now the observations that are at
288
least x have conditional d.f. G(u) := F(u)lu>x/F[x, oc), so each has probability G~v, oc) = Fly, oc)/F[x, oc) of being at least y, and (4.5) for n = 0 follows. For n _> 1, consider the time when the mth observation of value at least x occurs. Out of these m observations of value at least x, we need k to be less than y, where 1 < k < m. This has probability
IF[x, oc)J \Fix, ~ ) /

and accounts for 1 time unit spent by the ruth order statistic in the interval Ix,y). For it to have sojourn time n there, (a) of the next n - 1 observations, exactly n - k must be less than y, and (b) the next observation after those n - 1 must be oc)k-lF(--oc,y)n-k and necessitates at least y. Event (a) has probability (k_l)f~v, ,-2 k < n, while event (b) has probability Fly, oc). Assembling these contributions together we find
k=l
which tidies up to give the claimed formula.
[]
The mth order statistic spends a sojourn in each element of the sets N1, . . . , Nm, and we may attach these sojourn times as marks to the respective elements of the sets. The resulting marked point processes retain the independence that Ignatov's Theorem 2.1 establishes for ~1 . . . . , ~m. When F is continuous the probabilistic structure of the marked point processes can be simply described, and this is best done by splitting each of ~ I , ... , Nm into separate (unmarked) point processes according to the values of the marks. We are led to the following result, due to Goldie and Rogers (1984) and Stam (1985). THEOREM 4.3. For i = 1. . . . , m let mNi be the marked point process obtained from Ni by marking each element with the sojourn that the ruth order statistic spends in that element. Then raN1, . . . , m.~m are independent and identically distributed. For l = 1, 2, ... let tony) be the point process consisting of all elements of toni whose marks have value I. I f F is continuous then m~}5 are independent Poisson processes, for l c N and i = 1, ... , m. The (cumulative) intensity function of tony), i.e., the expected number of points it has in the interval ( - o c , x], is
E(m,~}l)(-oc,x]) = (F(x))I/I
(x E ~)
To understand where the latter formula comes from, recall that by Shorrock's Theorem 3.1, ~ i is identically distributed to N1, and that by Theorem 3.3 it is, when F is continuous, a Poisson process with intensity dH(x) = dF(x)/F[x, oc). By Theorem 4.1, therefore, raN}I) is a Poisson process with intensity
289
(F(x) ) l-1 (1 - F(x) )dH(x) = (F(x) )t-l dF(x) ,

and so
E(m l l) (-oo, X])
f'
= (F(x))l/l
5. R e c o r d t i m e s
The times L~m), defined in 4.1, can be expressed by (4.2) as

n-1
L (m) = m 4- ~ AI m) (m,n = 1,2,...) ,

i=1
(5.1)
where we interpret a sum such as }-2~=1 as zero. This gives in principle a means to model the L~m), since by Theorem 4.1 we have the distribution of the @m) in terms of the partial-record-value sequence Q(m). When F is continuous we can cut through this complicated structure, since the times become distribution-free, as we shall see. Specifying F continuous still allows the supremum x+ of its support to be finite, but the continuity means in particular that F can have no jump there, so limxtx+F(x)= 1 in all cases. The basis of all the simplifications in the F-continuous case is the following celebrated insight of Dwass (1960, Theorem 1) and R6nyi (1962, p. 56) concerning the initial ranks p11 defined in (2.2). LEMMA 5.1. (Dwass-R6nyi Lemma). If F is continuous then Pl, P2, P3, ... are independent and for each n p11 is uniformly distributed on { 1 , . . . , n}. Uniform distribution here is of course discrete uniform, with P(Pn = J) = 1/n for
j=l,...,n.
When F is continuous the times L~m) defined in (4.1) become exactly the times when the ruth order statistic first exists and then changes value. Several characterizations/representations/constructions of the whole sequence L (m) :--- (L~m))11~ then flow from the Dwass-R~nyi Lemma 5.1. First, define the ruth record counting sequence N (m) (N(m))11>_mby
=
11
N(nm):=E6~m)
k=m
(n=m,m+l,.,.)
(5.2)
where 6~m) := l{p k <_ m}; then we have a representation as an inverse of N (m)" L} m ) = i n f { j : j > _ m , N (m) > n } ( n = 1,2,...) .
The point is that N (m) has a very simple structure that does not depend on F, so long as F is continuous: the summands 6~ m)' ,~(m) in (5.2) are independent ~m+l' " " " Bernoulli r.v.s with P(62m) = 1) = m/k. The second characterization of L (m) is as a Markov chain.
290
J. Bunge and C. M . Goldie
THEOREM 5.2. Fix m C N and assume F is continuous. Then L (m) = (L!m))nc~ is a M a r k o v chain on (m, m + 1 . . . . } with initial state LI m) = m and transition probabilities pjk = P(L~+)I = k[L (m) = j) given by
P j k = m~--~--77-~, H/=o(k
m-I Fi,=o
U -
I)
l)
(k=j+l,j+2,
..
.)
The case m = 1 merits a separate treatment, since the times L~ (1) are the record times L~ as defined in Section 1. COROLLARY 5.3. When F is continuous the sequence (Ln)nc~ of record times is a M a r k o v chain on N with initial state L~ = 1 and transition probabilities pjk = P(L~+I = kiln = j) given by
- (k-
1)k
(k = j +
1,j+2,...)
(5.3)
We shall call the times L(~ m) the mth record times. This is the usual nomenclature, although the label is occasionally assigned by some authors to the times T~ (m) when observations of initial rank exactly k occur. The latter times are worth some consideration. Thus put
T(m) := i n f { n : n > 1,p~ --- m},

Tj.(+]:=inftn:n>TJm),pn=m}, (j = l, 2, . . .) ,
and note that L51) = Lj = T) 1). F r o m the Dwass-R6nyi L e m m a 5.1 it is clear that when m > 1 the sequence (T~(m)),<~ is a M a r k o v chain with the same transition probabilities as the record-time sequence (L~)~e~, but with a r a n d o m initial state. PROPOSITION 5.4. Fix m E { 2 , 3 , . . . } . When F is continuous the sequence (T(~m))n~ is a M a r k o v chain on {m, m + 1 , . . . } with initial distribution
P(T~") = J ) -- ( j - 1)~
m-1
(/=m,m+
1,...)
(5.4)
PgT"(m) = k [ r (m) = j) given by (5.3). and transition probabilities pjk = -v-n+1

F o r the next representation, due to Williams (1973), we employ the 'ceiling' function [X~ : = i n f { n : n E Z , n>x} (xE) .
THEOREM 5.5. Assume F continuous. Let E2, E3, ... be independent unit exponential r.v.s, and define successively L1 := 1, /~n+l := I/he ~+~] (n = 1 , 2 , . . . ) .
Then the sequence (Ln)nc~ has the same distribution as the sequence (n)nC~"
Recordsequencesand theirapplications
291
It is important to understand that this gives much more than just each L~ having the distribution o f / ~ . All joint distributions of the various L~ are those of the corresponding n. We can easily carry this representation over to the times T)m) when observations of initial rank m occur. For put To(~) := m - 1 and observe that the transi. . . . . . m) tlon matrix (Pyk) of (5.3) then gives the correct distribution (5.4) for T( . Thus we can consider (f~(m))nc~ (where N0 := {0, 1,2,...}) as a Markov chain with initial state m - 1 and transition probabilities Pjk, and are led to the following. COROLLARY 5.6. Assume F continuous. Fix m C {2, 3,...}. Let El, E: . . . . independent unit exponential r.v.s, and define successively ~m) := m1,
be
~:I :=
If. (m)eE"+']
(n = 0,1, 2, . . .) .
Then the sequence (Tn)n~m0 has the same distribution as the sequence (iPn)~m0. A similar representation to the above can be given for the mth record times L!'~) (Goldie, 1983, Theorem II.6.1) but it is somewhat more complicated. The final representation that we select for inclusion here is one for record values and times together, but the representation that it gives just for record times is still valuable on its own. As remarked in Section 4, when F is continuous the sequence O(), consisting of all record values of orders up to m, is the sequence of points of a Poisson process of intensity mr/where t/is the avoidance measure of (3.2). We can generate these from the points of a unit Poisson process by a form of 'probability integral transform'. Then given these Poisson points we can generate conditionally independent sojourns as in Theorem 4.1 by secondary randomizations, and finally add up the sojourns to give the mth-record times by (5.1). The trick that does the secondary randomization is the following easily checked result. LEMMA 5.7. Fix 0 < p < 1, let E have a unit exponential distribution and set E
Then Y has the geometric distribution
P(Y=y)=(1-p)y-lp
(y = l, 2, . . .) .
We are led to the following representation, due to Vervaat (1977) and Deheuvels (1984). THEOREM 5.8. Fix m C N. Assume F continuous. Let ~1, 82,-.. and El, E 2 , . . . be n independent unit exponential r.v.s, and put Sn := ~ j = l ej for n = 1,2,.... Define a function g (an inverse to the cumulative avoidance measure t/(-ec,xl) by
g ( y ) : = i n f { x : - l n F ( x , oc) >_y}
(O<y<ec)
292
J.
Bunge and C. M. Goldie
both for n = 1,2, . Then the . double sequence (Q!m), l(m)) . . . n ) n C N has the same ;~(m) distribution as ~ n , [(m)~ n Jnc~"
6. Limit theory and strong approximation

We first summarize the distributional limit theory, discovered by Resnick in 1973, for the sequence (R~) of record values; see Resnick (1987, 4.2) for a full account. By Ignatov's Theorem 2.1, each set ~k is probabilistically identical to the set ~ of records, whose elements form the sequence (R~). Thus limit theory for 1-records applies equally to k-records (observations of initial rank k), for each k. Further, if F is continuous then the sequence Q(m) that we met in Section 4, of observations of initial rank at most m, is distributed as the sequence of record values from observations with d.f. F ~. Hence, it too has the same distributional limit theory, modulo adjustments to formulae for domains of attraction and norming constants. The idea is to find all limits in distribution of (Rn - b ~ ) / a n for suitable constants b~ ~ ~ and an > 0. First note that if G(x) is such a limit d.f., that is, (Rn - bn)/a~ ~ G where ~ denotes weak convergence, then G(ax + b) is another limit d.f., attained by replacing the constants an and bn by aa~ and bn + ban respectively, where a > 0 and b E ~. Thus the limit distributions are determined only to within type: they are equivalence classes under the operation of positive affine transformations x H ax + b where a > 0. The possible limit types for record values are as follows, expressed in terms of the standard N o r m a l d.f. ~b(x) := fu_oo(27z)-l/2e-"2/2du. THEOREM 6.1. (Resnick, 1973). Assume F continuous. Then the possible nondegenerate limit d.f.s for (Rn - bn)/an, as n ~ co, are those within the type of one of
where ~ > 0 is a constant, The limit distributions are closely associated with the three classical extremevalue distributions, which are the possible limits for normed and translated
293
maxima ofi.i.d.r.v.s; see Resnick (1987) for details. Note that ~1 is the lognormal distribution. Letting G stand for one of the limit d.f.s ~ , ~ and ~b, we say that F is in the domain of attraction for records of G, notation F C DR(G), if constants an > 0 and b, c R exist such that (Rn - b~)/an ~ G. (As usual, (Rn) is the sequence of record values from i.i.d, observations with d . f . F . ) There are several ways to express the domains of attraction and suitable scaling and location constants, but the following are the most explicit. Recall from Section 3 that when F is continuous the hazard function H is given by H ( x ) = - ln(1 - F(x)). Its inverse H +- may be defined by H~-(u) := inf{x : H(x) >_ u} where inf0 := +oc.
THEOREM 6.2. (Resnick, 1973) Let F be continuous. if and only if x+ = oo and ~/H(x) = c(x) + l i d ( t ) t-ldt where e(x) ---+c E R and d(x) ~ e as x ~ oc. In that case, suitable scaling and location constants are an := H~(n), bn := 0. 2) F E D R ( ~ ) if and only if x+ < oc and v/H(x+ - u) = c(u) + f2 d(t)t-ldt for u > 0, where c(u)--+ c E N and d(u)--+ c~ as u ; 0. In that case, suitable constants are an := x+ - H+--(n), bn := x+. 3) F E D R ( ~ ) if and only if 1) F E D R ( ~ )
[ xdt x/H(x) = c(x) + j_c~f(t)
(x < x+)
where f ( . ) > 0, f ' exists and, as x ~ x+, e(x) ~ c E R and f'(x) ---+O. In that case, suitable constants are an := H ~ ( n + v/n) - H~-(n), bn := H'--(n). Turning from weak to almost-sure limit theory, the best approach has turned out to be to develop strong approximations in the sense of Komlds, Major and Tusn[tdy (see, e.g., Cs6rg6 and R~v6sz, 1981), from which limit theorems may be readily extracted at will. Pfeifer and Zhang (1989) give a valuable survey. The representations of Theorems 5.5 and 5.8 are starting points for strong approximations. In Theorem 5.5, for large n the Ln will be large and the ceiling operator ['7 has a very small effect. Therefore L,/Ln+I is approximately e - E n + l . In Pfeifer (1987) this approximation is made explicit, being represented through a uniform r.v. defined on a further augmentation of the probability space. Thus the author sets up a sequence (W~) of independent Uniform(0, 1) r.v.s, which is also independent of (Ln), such that L. + W n + l L n + l ~ Ln e -e"+l = (1 - Wn+I)LI~+I l (n = 1,2, ..
.)
This device, which is a particular case of a useful general technique for Markov chains, leads to the following strong approximation. We write An for A(1) = Ln+l - L n , and a.s. for 'almost-surely'.
294
THEOREM 6.3. (Pfeifer, 1987). Assume F continuous. On a suitably augmented probability space there exist independent unit exponential r.v.s En, and a nonnegative r.v. Z, such that with S, := ~j=2 EJ, 1) lnLn = Z + Sn + o(1) a.s. and lnAn = Z + S~+~ + ln(1 - e -Eo+I) + o(l) a.s., as n --~ co; 2) Z and Sn are asymptotically independent; 3) EZ -- 1 - 7 where 7 -~ 0.577216 is Euler's constant; 4) E(Z k) < co for all k E N. Note that the r.v.s - l n ( 1 - e -e"+l ) appearing in 1) are independent Uniform (0,1). One may further approximate the sequence (Sn) in this theorem by a Wiener process. By that means or by application of standard limit theory one can immediately deduce the following classical results for a sequence of r.v.s Z, which can be either lnLn or lnAn (or even H(R~)):
-- -+ 1 a.s.,
Z.
/7
Z.-n
V/V/
- ~
N(0,
l)
(6.1)
as n -+ co, and the lira sup and lira i n f o f (Zn - n ) / ~ are respectively 1 a.s. By applying the above to the sequence (Tn(m)), which is governed by the same Markov chain as (L~) (see Corollary 5.6), it is clear that the above theorem's results also hold for (ir~(~)). This and further strong approximations of the above type, and their consequences, may be found in Deheuvels (1988). We briefly discuss one further strong-approximation technique, due to Resnick, which uses the important construct of an F-extremal process, and also provides insight into the r.v. Z that appears in Pfeifer's Theorem 6.3. Consider a half-plane (0, co) x N whose points (t, x) represent time t > 0 and space x E N. Assume F continuous, and let N be a planar Poisson process on this space, with mean measure defined by
EN((O, t] x (x, co)) = t(-lnF(x)) .

The idea is that in any Borel set, the number of points of N is Poissondistributed with mean the value of the mean measure for that set, and the numbers of points of N in disjoint sets are independent r.v.s. Notice that the value of the mean measure for any semi-infinite rectangle (0, t] x (x, ec) is finite. So the number of points of N in the rectangle is finite a.s. However the mean measure of the infinite strip (0, t] x R is infinite and so there will certainly be points of N in the strip. Therefore, with probability one, there will exist a point of N of maximal height. This is the value of the F-extremal process Y at time t. Formally, := sup{x: N((0, t] (x, co)) > 0} . This defines the stochastic process (Yt)t>0 to have paths that change only by upward jumps, and have limt+0 Yt = x_ and limtT~ Yt = x+, a.s. Properties of the
295
process may be found in Resnick (1987, 4.3), but for present purposes two are important: 1) the jump times of (Yt) form a logarithmic Poisson process on (0, ~ ) , i.e., the mean number of jumps in any time interval (s, t] is ln(t/s); 2) with (X~) i.i.d, with d.f. F, the F-extremal process 'interpolates' the sequence of partial maxima of the X/, in that (max(X1,... , X , ) ) , ~ =d ( Y ~ ) , ~
,
where =d means equality in distribution. The right-hand side is just the sequence of values of (Yt) at integer times, and the equality in distribution here refers to the whole sequences on either side. In Figure 5 we plot a simulated F-extremal process, with F the standard Normal distribution. For purposes of strong approximation, property 2) means that the jump-times of the sequence of maxima, i.e., the record times of the X~, are shadowed, more and more closely as time increases, by the jump-times of (Yt), while 1) means that the jump-times of (Yt) form a sequence with a very simple probability distribution. The F-extremal process has an excess of jumps from time 1 onwards, compared to the number of record values, and the total excess S in the limit as t --+ oo is a.s. finite and has mean E(Z). The strong approximation is expressed, in terms of the jump times ~1, "c2,. of the F-extremal process from time 1 onwards, by lnL, = in V,+s + o(1) = ln~, + O(lnn) a.s. as n ~ oc. The sequence (ln %) forms a unit (homogeneous) Poisson process on (0, oc), so its large-n behavior can be read off, and conclusions such as (6.1), for Z, := In L~, derived anew.
7. Extensions, connections and applications

A number of variations on the classical i.i.d, record model have been proposed, involving various combinations of non-stationarity and dependence among the observations. The oldest appears to be the so-called F ~ model (Yang, 1975). In
- -
o o
o o ~ o o o o o
o ~ o o oo o
0~
-i
%0
o o o
'
o
3'0
o
Oo
- 2 ~
o~O
~o ~
oo
%o
%o0
roo o% o 8 S%s
o
oo o
oooOo o oo
o ooo oo Oo~
%0
Oo
oooo. . . .
o "
Fig. 5. A simulated N(0,1)-extremal process, for 0 < t < 30.
296
this setup (X~)n~ is a sequence of independent but non-identically distributed r.v.s with Xn ~ F ~. for some fixed d.f. F and sequence (en),e~ with e~ > 0. Yang's original idea was that the e~ represented the population size and the observations X, were themselves the maxima from increasingly large populations; he used this to model Olympic record-breaking. This model has subsequently been studied by several authors, especially Nevzorov (see Nevzorov, 1995, and references therein). Alternatively, we may suppose that the distribution of the X, changes only after a record event, as proposed by Pfeifer (see Pfeifer, 1982, and references therein); this may be useful if the system producing the observations is altered or 'repaired' after a record occurs. A still different non-stationary setup was studied by Ballerini and Resnick (1985), who supposed that X~ = Y~ + en for some stationary sequence (Y,) and constant c. This model gave a good fit to a certain sports dataset (see also Feuerverger and Hall, 1996, and Borovkov, 1999). On the other hand, it seems to be more difficult to deal with stochastic dependence among the observations: here we note that Ballerini (1994) considered a dependent F ~ setup, and that Haiman and Puff (1990) studied m-records over stationary sequences with a certain dependence structure. Interest in Olympic records has naturally led to statistical investigations of two questions: model stationarity and prediction of future records. There has long been a thread of significant papers on these topics: see Robinson and Tawn (1995) and references therein, and Smith (1997). The distribution-free nature of the Dwass-R6nyi Lemma 5.1 suggests a connection between records and combinatorics, and indeed there is a fruitful link with, in particular, the theory of random permutations. A permutation of the integers 1 , . . . , n is a bijective mapping from the set { 1 , . . . , n} to itself. There are n! such mappings, and a random permutation generally means that all n! are equally likely, so that each has probability l/n! of being chosen. Much is known about random permutations and some of it translates into valuable results about records. We describe how one classical formula carries over in two distinct ways into the records context. A consequence of the Dwass-R~nyi Lemma is that if 6k is the 'indicator' r.v. that is 1 ifXk is a record value, and 0 otherwise, then provided F is continuous the 6k are independent Bernoulli r.v.s with P ( ~ = 1) = 1/k. Knowledge of the 6k is equivalent to being given the record times (L,). The number of records among X1,... ,Xn is N~ = ~ k
k=l
(7.1)
(the m = 1 case of (5.2)). Each record among X1,...,Xn gives rise to a time interval starting with its time of occurrence L~ and ending just before the next record time Ln+l, o r at time n, whichever comes first. Thus a set of 'inter-record time gaps' L 2 - L 1 , L3-L2,...,LNo-LN~-I, n+I--LN~ is defined, where if N~ = 1 there is just one time gap n + 1 - L 1 = n + 1 - 1 = n. In all cases the time gaps add up to n. Now a celebrated formula about random permutations,
297
translated over to records, says that the probability that there are aj time gaps of length j, for j = 1 , . . . , n , is
pn(a,, . . . ,an) = I-[ ~ i
j=l J J"
1 ~ Zjaj
kj=l
= nj
(7.2)
This formula, due to Goncharov, is in fact a special case of the famous Ewens sarnplingformula of population genetics. In the general case, permutations are not equally likely but have probabilities proportional to Ok where k is the number of cycles in the permutation and 0 is a positive parameter. Our special case, when 0 = 1, is the only one to have been found to have meaning in records terms. However, even for that case, many limit theorems and large-sample approximations flow from (7.2): see, e.g., Arratia et al. (1992) and Arratia and Tavar4 (1992) for details, as well as for the general case of (7.2). The basis of the above link between records and permutations is the sequence of independent Bernoulli r.v.s appearing in (7.1), which may be used to build a random permutation through its cycle representation. There are several ways to construct the link (Goldie, 1989), and we describe one other realization of (7.2) which arises from one of them. Recall our notation XO) > ... > X~ (') for the order statistics of X1, . . . , Xn. The indicator r.v.s I{X,(k) E ~ } are independent, for k = 1 , . . . , n , with P(x(k) E ~ 1 ) = 1/k (Goldie, 1989), and this fact suggests defining on them something analogous to the inter-record time gaps above. Thus let M1 < M2 < .- < MN, be the list of indices i for which X(~z) E ~1, or formally M I : = 1, My := min{k : k > Mj-1, X,}k) E N1} (j = 2 , . . . ,Nn) .
Define gaps M2 - M1, M3 - M 2 , . . . , MNo -- MNn-1, n + 1 -- MNn. Then the probability that there are aj of these gaps of length j, for j = 1 , . . . , n, is given by (7.2). The probability distribution of the number of records Nn up to time n, known as the Karamata-Stirling law, is best defined through the representation (7.1). An explicit formula for the probabilities involves Stirling numbers (David and Barton, 1962, pp. 181-2), but as Feller (1968, XI.2(c)) observes, the probability generating function is simpler:
The Karamata-Stirling law occurs also m summability theory (Bingham, 1988). While most of the theory of records depends critically on the order properties of the real line, some aspects permit generalization to partially ordered sets (Goldie and Resnick, 1989). For instance, with several of the possible extensions of the notion of record, criteria can be given for there being finitely or infinitely many records in chosen regions, including the region that is the whole space. When the space is the Euclidean plane it is typical that there are only finitely many bivariate records (records in both components simultaneously), that being
298
the case for example when the observations are drawn from any bivariate Normal distribution with correlation not equal to +1 (Gnedin, 1998). However one can estimate (Goldie and Resnick, 1995) the small probability that there will be a large number of bivariate records in chosen regions in N2, and find where they will fall. A complementary result is in Deuschel and Zeitouni (1995), and it is interesting that the authors of this article establish a new connection with random permutations, different from those discussed above, namely an extension of a celebrated theorem of Vershik and Kerov on longest increasing subsequences.
8. Random record processes
Imagine now that the observations X~, X2, ... occur at random times 7'1, T2, .... In this case the process of successive maxima of the X,, is called a random record process, 'random' referring here to the arrival times of the Xn not their values. The sequence T1, T2, ... is sometimes called the 'pacing process' and we speak, for example, of 'Poisson paced records.' Much less work has been done on this process than on the classical record process discussed above; the first paper appeared only in 1971 (Pickands, 1971). Nevertheless, the random record process has been shown to have deep and sometimes surprising connections to other parts of probability theory. This is the perspective we continue to take in this survey: we discuss known results for three classes of pacing processes (Poisson, pure birth, and renewal), where possible with a view toward more general theory. More formally, then, let (Tn,X,,),e~ denote a bivariate random sequence. We suppose that 0 < 7"1 < T2 < ... so that marginally (Tn)nc~ forms a point process on [0, ec), but for the moment we make no further assumptions. Then the random record process is the bivariate sequence of record times and values, that is (TL. ,XLn)n6N = (f'L,,, Rn)nc~. This is the most general formulation under which the random record process retains its particular identity: from this perspective it can be regarded as a thinning of a m a r k e d point process where the points Tn are marked by the observations X~. There is now a well-developed martingale-based theory of thinned marked point processes (Last and Brandt, 1995), so it is reasonable to ask what this theory can tell us about random record processes. However, the resources of that theory have not yet been fully brought to bear on the random record problem, for two main reasons. The first is historical: martingale-based point processes and random record processes come from different streams in probability and the two have not yet completely converged. This is exacerbated by the second reason: although the mark sequence (X~) is independent of the time sequence (Tn) in the models we will consider, and this is the canonical situation for marked point processes, extracting the sequence of record times (TL) from (T,,Xn) introduces a further marked point process in which the marks depend on the points in a rather non-standard way. To see this, let fin denote the indicator of the event {X~ is a record} = {a record occurs at Tn}. Then the Dwass R~nyi Lemma says that ( 6 n ) ~ is sequence of independent Bernoulli random variables
299
with P(6~ = 1) = 1/n, and (3~) is independent of (Tn) since (Xn) is. The random record times can then be regarded either as those points of the marked point process (T/7,8~) for which 3,, = 1, or simply as the process (T~8,). In other words, the record arrival process is indeed a thinning of (T/7) - but the thinning probability depends on the state of (T~), that is, on n. The standard theory of thinned marked point processes does allow for variable thinning probabilities but the probability typically depends on the location of the point or on some other process that is independent of (T/7). The random record process should fall within the purview of martingale-based theory of thinned marked point processes, but the matter has apparently not yet been explored in depth. Therefore, we turn now to results obtained by (more or less) classical methods in specific cases. We will take the following as our basic setup: (T~)n~ is a simple, nonexplosive point process on [0, ec) (no multiple points and no finite accumulation points), and the observations or marks (X/7)/7c ~ form an i.i.d, sequence which is independent of (T/7). Here we will take the d.f. F of the X, to be continuous and in some cases even absolutely continuous; furthermore, our random record process (TL, R/7) only looks atfirst records. Variations on this theme have occasionally been considered and we will mention these as appropriate. We see immediately that, since (X/7) is independent of (T~), marginally the record sequence (R/7) behaves according to the classical model. Thus, there are two main areas of interest here: the joint behavior of (TLo,R~) and the marginal behavior of (TL,), whether in finite time or asymptotically as time goes to infinity. Record times have attracted the lion's share of attention as we will see. Note: here and throughout we define N(t) := #{T~: T~ < t,n > 1} and L(t) := #{TL, : TL,, <_
t,n>_l}fort>O.
9. Poisson paced records
We begin with the Poisson pacing process, since it is the most familiar conceptually, and was the first to be studied in this context (Pickands, 1971). This model appears to have originally been motivated by an insurance problem in which claims of random size arrive at random times, and we are concerned about the arrival times and values of claims of record size. Let (Tn)w~ be the points of an inhomogeneous Poisson process with rate 2(t) for t_> 0, so that E(N(t)) = A(t) := f[0 t] 2(u)du for t > 0. We suppose that the distribution F of the X/7 is absolutely conti~mous with density f . Then it is straightforward to write the joint density of (TL~,Rn)/7~: frL1,...,rL~,,R~,...,R,(tl, , t,, r l , . . . , rn)
/7
= e-A(h),~(tl)f(rl)He
i=2
-(A(ti)-A(tj 1))(1-F(ri-~))/~(ti)f(ri)
(9.]_)
for 0 < tl < -.. < tn and rl < " ' " < rn. This simply says that at time ti the record value ri occurs, and then from time ti to ti+l no observation X exceeds ri. One can
300
readily integrate (9.1) with respect to t l , . . . , tn to verify that marginally the record values are distributed as in the classical model. It is less easy to derive the marginal distribution of the record times, but this can be achieved by other methods. The result, found via extremal process methods by Deheuvels (1982) and via characteristic functions by Bunge and Nagaraja (1992b), is the following. THEOREM 9.1. Let el, e2, ... and El,E2,.. be i.i.d, unit exponential r.v.s, and put So := 0 and Sn := ~ - - 1 ej for n = 1,2,.... In the Poisson-paced record process as defined above,
( TL1, TL2, . . . , TLn)
=d
for all n N.
A+--(EleS),A+--(E1 es E2eS~),...,A*-
Ei es~-~
(9.2)
In particular if (Tn) is a homogeneous Poisson process with rate 2 then the action of the function A t- may be replaced in (9.2) by multiplication by 1/2. It is worth pausing here to compare this result with the representation of record times in the classical model given by Theorem 5.8. Taking m = 1 in Theorem 5.8 we have
L.
~-" [
Ei
e-S,)[
(9.3)
But e~ ~ - 1 / l n ( 1 - e -x) as x --+ oc, so since lim~_~o~S~ = oc almost surely we are justified in regarding (9.2) with 2(0 = 1 as a continuous-time version of (9.3) in the homogeneous case. One reason for studying specifically inhomogeneous Poisson paced records is as follows. In real life we frequently get the impression that records are being broken 'too often' in financial data, weather, sports, and so forth. According to the theory of the classical model, this means that there must be some inhomogeneity or non-stationarity in the process. In the classical model non-stationarity can enter only via changing distributions of the X~, but in the random record model it may be that observations are occurring at an increasing rate: in the Poisson case this corresponds to an increasing rate function 2(t). Thus we are led to study the asymptotic (t--4 oe) behavior of the inhomogeneous Poisson-paced record process. The basic result is as follows (due to Gaver and Jacobs (1978) for 2 ( t ) = e xt, and Bunge and Nagaraja (1992a) for the generalization below). Denote the inter-record times by Ut. = Tt,,- Tt._~ for n _> 1, with Tto := 0. THEOREM 9.2. In the Poisson-paced record model, suppose that r1 m ,~(t) -=20 t ~ A (t) (0, oo) .

Then
301
(CrLN+,,...,ULN+,)
and
(N
(ULl(r),..
"'
UL,(r))~I(E1,... 20
'
En)
(T--+oo) ,
for n = 1,2, .... where ULN+j is the j t h inter-record time after the Nth record occurrence and ULAr) is the jth inter-record time after time T. In other words, when 2(t)/A(t) ~ 20 so that 2(t) has essentially exponential growth, the record arrival process converges to a homogeneous Poisson process with rate 2o, whether we consider the record arrivals after a large number of records N or after a long time T. Thus, one way to get regularly occurring records in the time-limit is to have the underlying observations 'speeding up exponentially'. But a homogeneous Poisson process of record arrivals can also be constructed directly, for all t E [0, e~), via the F ~ model. Define the interarrival times Un := Tn - T~-1, where To := 0, and let (Un)nc~ be a delayed renewal process with U1 ~ exponential(2) and U,, ~ exponential(2/p) for n > 2, for some fixed p E (0, 1) and 2 > 0. Now let 0~1 := 1 and ~, := p/(1 _ p ) ~ - I for n >_ 2. Then it is readily shown that P(X, is a record) _=p for all n >_ 2, and from this it follows that the record arrival process is homogeneous Poisson with rate 2 (Bunge and Nagaraja, 1992b). Thus, regular record breaking may be caused by exponentially increasing arrival rates of the observations or by geometrically increasing sizes of the underlying populations, and there are other possible causes as we will see. We conclude this section with two further comments on known results. First, a slightly modified version of (9.1) can be used to compute the limiting joint density of inter-record times and normalized record values when 2(t)/A(t) --+ 2o, for the first n records either after the Nth record time (N ~ oo) or after a long time T (T -~ ec) (Bunge and Nagaraja, 1992a). But while the inter-record times converge marginally to i.i.d, exponential in either case, it is interesting to note that the limiting distributions of the normalized record values differ depending on whether the normalization is based on N or on T. Second, we note that Hofmann and Nagaraja (1997) have recently studied F ~ random record models in some detail, obtaining (among other results) generalizations of Theorems 9.1 and 9.2 for the F ~ case.
10. Birth-process paced records

We now consider records over observations occurring at the points of a pure birth process. This makes sense as a model for real phenomena if we imagine, for example, a population of micro-organisms reproducing in an environment with
302
unlimited food, and we are interested in the successive m a x i m u m level of some m e a s u r e m e n t on each new organism. Let (Tn)n~ be the j u m p points of a pure birth process with m + 1 'progenitors' for some fixed non-negative integer m, so that P(N(t +h) - N ( t ) = 1 ] N(t) : n) = (1 +m +n)h +o(h) as h ,[ 0, for each t > 0. M o r e specifically, (Tn)nc~ is a continuous-time M a r k o v process with q-matrix given by qn,n+l = --qn,n = 1 + m + n (n : 0, 1,...) ,
and qi,j = 0 otherwise; in this case U1, U2,... are independent r.v.s with Un exponential(n + m) for n _> 1. We then have the following result (see Bruss and Rogers (1991) for m = 0, Bunge and N a g a r a j a (1992b) for m = 0, 1, and Browne and Bunge (1995) for arbitrary m). THEOREM 10.1. Suppose that (Tn) is a birth process as described above. Then the record arrival process (TL,,)nc~ is equal in distribution to the arrival time sequence (*n)n~ of a M a r k o v renewal process (~,,, 0n)n~. Here 0n C { 1 , 2 , . . . , 1 + m} and P@I ~ t) = 1 - e -(1+re)t, P(~n+l - ~n -< t, On+~= jlOn = i) = Tcij(1 -- e -it)
(n >_
l) ,
for t _> 0 and 1 < i _< j _< 1 + m, where 01 = 1 + m with probability 1 and
1
1
2 1
0
I
2 1
...
O\
0
1
.-. ""
...
o 0 (1 < i _ < j _ < lm) .
(~ij)----
3
1 l+m
3
1 l+m
g
1 l+m
1 m
T h e o r e m 10.1 says that the inter-record times are conditionally exponentially distributed: they start out with m e a n 1/(1 + m) and the m e a n changes at each record occurrence according to the M a r k o v chain (0n), eventually settling down to m e a n 1 (the absorbing state of (0n)), that is, a h o m o g e n e o u s Poisson record arrival process. Here, then, is another model that produces regular record arrivals; in fact for m = 0 the record arrival process is h o m o g e n e o u s Poisson for all t E [0, oc), and even if m > 0 it attains this behavior after some r a n d o m time that m a y be large but is finite with probability 1. Bruss and Rogers (1991) studied the case m = 0 (possibly with a deterministic time-change) for k-records, finding that the k-record arrival processes are i.i.d. Poisson (after some delay if k > 1). We can c o m p a r e T h e o r e m 10.1 to T h e o r e m 9.2 a little m o r e formally, as follows. N o t e that in the birth process discussed above we have E(N(t)) = (1 m ) ( e t - 1) so that
~E(N(t))--+ 1 u(N(t))
(t ~
oo) ,
(10.1)
303
which is exactly the condition of Theorem 9,2 if we take the (unimportant) constant 2o to be 1. It seems reasonable to envision a 'meta-theorem' stating that in general (10.1) implies a homogeneous Poisson weak limit for the record arrival process. But the proofs of Theorems 10.1 and 9.2 seem to be quite case-specific, so a unified formulation and proof remain to be found. Below we will see another version of this result, for renewal-paced records, derived by a still different approach.
11. Renewal paced records

Finally we turn to the richest - and most difficult - random record model: the case where ( T ~ ) ~ is a renewal process. By this we mean an ordinary or 'undelayed' renewal process, so that ( U , ) ~ is an i.i.d, sequence of positive r.v.s; the assumption that (Tn) is simple implies that P(Un = 0) = 0. Only one exact result is known in this case (Browne and Bunge, 1995). THEOREM 1 1.1. Suppose that (G) is a renewal process with
where (En)ncN is an i.i.d, sequence of unit exponential r.v.s and ( Z n ) ~ is an i.i.d. sequence of r.v.s, independent of (E~), with Laplace-Stieltjes transform (LST) e -s~ for some a E (0, 1], so that Zn is positive a-stable. Then (UL,,..., UL,) =d ~(UleS/~, . . . , Unes" da) for all n E N, where Sn := ~i=1 E i and So := 0. Theorem 11.1 directly generalizes Theorem 9.1, because when ~ = 1 (T~) is a renewal process with exponential (2) interarrival distribution, i.e., it is a homogeneous Poisson process with rate 2. The presence of the homogeneous Poisson process (S,) in the exponent on the right-hand side of (11.1) is noteworthy, and we may ask whether any other random record processes admit such a 'logarithmic Poisson' representation. We return to this question at the end of this section. Lacking an exact distributional representation, what can we say about a general renewal-paced record process? One way to answer this question is to look at the tails of the inter-record time distributions: from this perspective the theory of regular variation holds the promise of fairly general results. This approach originated with Westcott (1977, 1979) and was followed up by Embrechts and Omey (1983) and Yakymiv (1986). Here we give the main result of Embrechts and Omey (1983), who studied regular variation of the tails of general subordinated distributions, that is, distributions of random sums of random variables, about which more below. Let R V ~ denote the set of regularly varying functions (at ~ ) with index a, i.e., f E RV~ ~ if f ( x ) ~ x~H(x) for some slowly varying H (see Bingham et al., 1989, for a full account). (11.1)
304
THBOgEM 11.2. Let (Tn)n~ be a renewal process with interarrival times (Un)ns~ and suppose that U, has d.f.G. Then for any fixed ~ E [0, 1), the following are equivalent:
1 - G(x) E R V ~ ,
P(uL > x) c Rv_ , P( L > x) Rv_ .

Moreover, as x -+ oe,
P(UL, > x ) ~ P(TL, > x ) ~
In ,i _ ~ ( ~ )
One remarkable aspect of this result is its assertion of the same tail behavior of record and inter-record times, which reaffirms the fact that the last inter-record time dominates them all. A completely different approach to the approximation of renewal-paced record times was taken by Ennadifi (1995a, 1995b). She considered strong approximation of the record arrival process via Wiener processes defined along with the random record process on a suitably enlarged probability space. We present two examples of Ennadifi's results here, while noting that their implications have not yet been fully explored. Assume that U~ has its moment generating function existing in some nondegenerate neighborhood of zero, so that in particular it has finite mean and variance. TH~ORE~ 11.3. It is possible to define L(t) and a Wiener process W(t) on the same probability space such that for each e > 0, sup [L(t) - In t - W(ln t)[ = O(ln In T) ,
e<t<_T
almost surely as T ---+ oc. Based on similar considerations she obtained the following. THEOkEM 11.4. The sequence of record times of the process (L(et))t>_o is almost surely asymptotically close to the arrival times of a homogeneous Poisson process of unit rate. Theorem 11.4 serves as yet another version of Theorem 9.2 and the asymptotic part of Theorem 10.1. Finally we consider questions related to random sums of random variables, or random sums for short. Here we will consider only the second inter-record time in a renewal paced record process. Observe that this is
N
UL2 = TL2 -- TL, =ct ~

i=1
Ui ,
where N is a random variable independent of (U,)nc~. The distribution of N is easy to calculate because if X1 = R1 were fixed at, say, r, then N would be the first
305
time such that X~ > r, so that P(N = n) = F n-l(r)(1 - F(r)) for n = 1, 2, .... But we can integrate this geometric probability with respect to r to find that unconditionally
P(N=n)---
n(n + 1)
(n = l, 2, . . .) ,
(11.2)
retrieving the distribution of L2 in Corollary 5.3. There is a vast literature on random sums of various kinds and in various applications (cf., e.g., Gnedenko and Korolev, 1996). Just as with the martingale-based theory of marked point processes, though, the application of the existing theory to random record processes, and in particular sums of the type (11.2), has not yet been clarified. Note in particular that E(N)= oc in (11.2), which puts the random record problem outside the purview of a good part of random sum theory. Nonetheless, we can envision applying this theory (or a suitable extension of it) to random record processes along the following lines. The equation N
i=1
=d-u1
P
(11.3/
has been well-studied in the case where N ~ geometric with success probability p E (0, 1); (11.3) is known as geometric stability (Kalashnikov, 1997). If we now take p to be a uniform (0, 1) r.v., say 4, we obtain N 1
UL2 =d ~
i=1
Ui =d ~ gl
(11.4)
Recalling that ee =d 1/~ when E ~ exponential(l), we see that (11.4) applies exactly to the homogeneous Poisson-paced record process with 2 = 1 (compare Theorem 9.1), and in fact (11.4) can be generalized to the renewal process of Theorem 11.1. As mentioned above, then, the question is: to what extent does the 'log-Poisson' representation of Theorem 11.1 characterize the specific renewal process used there? That is, to what extent does (11.4) characterize UI? The following partial result is based on analysis of functional equations derived from the 'random stability' equation (11.4) (Browne and Bunge, 1995). THEOREM 11.5. Suppose that (T,)nc~ is a renewal process with interarrival times (Un),c~, and (11.1) holds with S~ as defined in Theorem 11.1 and c~ > 0. Then
=d (n ,
where E, and Z~ are as defined in Theorem 11.1, and ~ ~ (0, 1]. Efforts to obtain more explicit characterizations of renewal processes from their random record process representations have so far foundered on analytical difficulties, especially the fact that here E(N) = oc.
306
12. Extensions and applications

N o t m a n y variations o n the basic r a n d o m record m o d e l have yet been studied in depth. Westcott (1977) looked at some specific models i n c o r p o r a t i n g dependence or n o n - s t a t i o n a r i t y b u t f o u n d that they gave rise to intractable calculations. As noted above, H o f m a n n a n d N a g a r a j a (1997) were able to extend some k n o w n results for r a n d o m record models to the F ~ setting. I n terms o f applications, there has been considerable research on the use of r a n d o m record models in best-choice or 'secretary' problems, n o t a b l y due to Bruss (see Bruss, 1998, a n d references therein). I n this application one supposes that ' o p t i o n s ' or 'choices' appear at r a n d o m times, a n d the objective is to choose the 'best' according to some optimality criterion; the r a n d o m record models that a d m i t closed-form representations often also a d m i t elegant solutions to such problems.
References
Ahsanullah, M. (1995). Record values. In The Exponential Distribution: Theory, Methods and Applications (Eds., N. Balakrishnan and A. P. Basu), pp. 279-296. Gordon & Breach, Amsterdam. Arnold, B. C., N. Balakrishnan and H. N. Nagaraja (1992). A First Course in Order Statisties. Wiley, New York. Arratia, R., A. D. Barbour and S. Tavar6 (1992). Poisson process approximations for the Ewens sampling formula. Ann. Appl. Probab. 2, 519-535. Arratia, R. and S. Tavar6 (1992). The cycle structure of random permutations. Ann. Probab. 20, 15671591. Ballerini, R. (1994). A dependent F~-scheme. Statist. Probab. Lett. 21, 2l~5. Ballerini, R. and S. I. Resnick (1985). Records from improving populations. J. Appl. Probab. 22, 487-502. Bingham, N. H. (1988). Tauberian theorems for Jakimovski and Karamata Stifling methods. Mathematika 35, 216-224. Bingham, N. H., C. M. Goldie and J. L. Teugels (1989). Regular Variation, corrected pbk. ed. Cambridge Univ. Press, Cambridge, UK. Borovkov, K. (1999). On records and related processes for sequenceswith trends J. Appl. Probab. 36, 668-681. Browne, S. and J. Bunge (1995). Random record processes and state dependent thinning. Stoch. Process. Appl. 55, 131-142. Bruss, F. T. (1998). Quick solutions for general best choice problems in continuous time. Comm. Statist. Stoch. Models 14, 241-264. Bruss, F. T. and L. C. G. Rogers (1991). Pascal processes and their characterization. Stoch. Process. Appl. 37, 331-338. Bunge, J. and H. N. Nagaraja (1992a). Dependence structure of Poisson-paced records. J. Appl. Probab. 29, 587-596. Bunge, J. and H. N. Nagaraja (1992b). Exact distribution theory for some point process record models. Adv. in Appl. Probab. 24, 24~44. Cs6rg6, M. and P. R6v6sz(1981). Strong Approximations in Probability and Statistics. AcademicPress, New York. Daley, D. J. and D. Vere-Jones (1988). An Introduction to the Theory of Point Processes. Springer, New York. David, F. N. and D. Barton (1962). Combinatorial Chance. Hafner, New York.
307
Deheuvels, P. (1982). Spacings, record times and extremal processes. In Exchangeability in Probability and Statistics (Eds., G. Koch and F. Spizzichino), pp. 233-243. North-Holland, Amsterdam. Deheuvels, P. (1983). The strong approximation of extremal processes, II. Z. Wahrscheinlichkeitstheorie verw. Geb. 62, 7-15. Deheuvels, P. (1984). On record times associated with kth extremes. In: Proceedings of the 3rd Pannonian Symposium on Mathematical Statistics (Visegrad, Hungary, 1982). Akad~miai Kiad6, Budapest, 43-51. Deheuvels, P. (1988). Strong approximations of k-th records and k-th record times by Wiener processes. Probab. Theory Related Fields 77, 195-209. Deuschel, J.-D. and O. Zeitouni (1995). Limiting curves for i.i.d, records. Ann. Probab. 23, 852-878. Dwass, M. (1960). Some k-sample rank order tests. In Contributions to Probability and Statistics: Essays in Honor ofH. Hotelling (Eds., I. Olkin et al.), pp. 198-202. Stanford Studies in Math. & Stat. 2, Stanford U. Press, Stanford. Embrechts, P., C. Klfippelberg and T. Mikosch (1997). Modelling Extremal Events: For Insurance and Finance. Springer, Berlin. Embrechts, P. and E. Omey (1983). On subordinated distributions and random record processes. Math. Proc. Camb. Phil. Soc. 93, 339-353. Engelen, R., P. Tommassen and W. Vervaat (1988). Ignatov's theorem: A new and short proof. J. Appl. Probab. 25A, 229-236. Ennadifi, G. (1995a). Strong approximation of the number of renewal paced record times. J. Statist. Plann. Infer. 45, 113-132. Ennadifi, G. (1995b). Asymptotic behaviour of record times for renewal paced records. Statist. Decisions 13, 379-397. Feller, W. (1968). An Introduction to Probability Theory and its Applications, vol. I, 3rd edn. Wiley, New York. Feuerverger, A. and P. Hall (1996). On distribution-free inference for record-value data with trend. Ann. Statist. 24, 2655-2678. Galambos, J. (1987). Asymptotic Theory of Extreme Order Statistics, 2nd edn. Krieger, Malabar, Florida. Gaver, D. and P. A. Jacobs (1978). Non-homogeneously paced random records and associated extremal processes. J. Appl. Probab. 15, 552 559. Gill, R. D. and S. Johansen (1990). A survey of product-integration with a view toward application in survival analysis. Ann. Statist. 18, 1501-1555. Glick, N. (t978). Breaking records and breaking boards. Amer. Math. Monthly 85, 2-26. Gnedenko, B. V. and V. Yu. Korolev (1996). Random Summation. CRC Press, Boca Raton, Florida. Gnedin, A. V. (1998). Records from a multivariate normal sample. Statist. Probab. Letters 39, 11 15. Goldie, C. M. (1983). On Records and Related Topics in Probability Theory. Report, School of Mathematical Sciences, University of Sussex. Goldie, C. M. and S. I. Resnick (1989). Records in a partially ordered set. Ann. Probab. 17, 678-699. Goldie, C. M. and S. I. Resnick (1995). Many multivariate records. Stoeh. Process. Appl. 59, 185-216. Goldie, C. M. and L. C. G. Rogers (1984). The k-record processes are i.i.d.Z. Wahrscheinlichkeitstheorie verw. Geb. 67, 197-211. Haiman, G. and M. L. Puri (1990). A strong invariance principle concerning the J-upper order statistics for stationary m-dependent sequences. J. Statist. Plann. Infer. 25, 43-51. Hofmann, G. and H. N. Nagaraja (1997). Random and point process record models in the F ~ setup. Preprint, Dept. Statistics, Ohio State University. J. Appl. Probab. (to appear). Ignatov, Z. (1978). Point processes generated by order statistics and their applications. In Point Processes and Queueing Problems, Keszthely (Hungary) (Eds., P. Bfirtfai and J. Tomk6), pp. 109116. Coll. Mat. Soc. J~nos Bolyai 24, North-Holland, Amsterdam. Kalashnikov, V. (1997) Geometric Sums: Bounds for Rare Events with Applications. Kluwer, Dordrecht. Last, G. and A. Brandt (1995). Marked Point Processes on the Real Line. Springer-Verlag, New York.
308
Nagaraja, H. N. (1988). Record values and related statistics - a review. Comm. Statist. Theory Meth. 17, 2223-2238. Nevzorov, V. B. (1995). Asymptotic distributions of records in nonstationary schemes. J. Statist. Plann. Infer. 45, 261-273, Nevzorov, V. B. (1987). Records. Theory Prob. Appl. 32, 201~28. (Transl. of Russian original in Teor. Veroyatnost. i Primenen. 32 (1987), 219-251). Pfeifer, D. (1982). Characterizations of exponential distributions by independent non-stationary record increments. J. Appl. Probab. 19, 127 135. Pfeifer, D. (1987). On a joint strong approximation theorem for record and inter-record times. Probab. Theory Related Fields 75, 212-221. Pfeifer, D. (1989). Einf~'hrung in die Extremwertstatistik. Teubner, Stuttgart. Pfeifer, D. and Y.-S. Zhang (1989). A survey on strong approximation techniques in connection with records. In Extreme Value Theory: Proceedings, Oberwolfach, 1987 (Eds., J. Hfisler and R.-D. Reiss), pp. 50-58. Lecture Notes in Statistics 51, Springer, New York. Pickands, J. (1971). The two-dimensional Poisson process and extremal processes. J. Appl. Probab. 8, 745-756. R6nyi, A. (1962). On the extreme elements of observations. MTA IlL Oszt. KO2l. 12, 105 121. Reprinted in Selected Papers, vol. 3, ed. P. Turfin. Akad6miai Kiad6, Budapest (1976), 50 65. Resnick, S. L (1973). Limit laws for record values. Stoch. Process. Appl. 1, 67-82. Resnick, S. I. (1987). Extreme Values, Regular Variation and Point Processes. Springer-Verlag, New York. Resnick, S. I. (1992). Adventures in Stochastic Processes. Birkh~iuser, Boston. Robinson, M. E. and J. A. Tawn (1995). Statistics for exceptional athletics records. J. Roy. Statist. Soc. C, Appl. Statist. 44, 499-511. Rogers, L. C. G. (1989). Ignatov's theorem: an abbreviation of the proof of Engelen, Tommassen and Vervaat. Adv. in Appl. Probab. 21, 933-934. Rogers, L. C. G. and D. Williams (1994). Diffusions, Markov Processes and Martingales, I: Foundations, 2nd edn. Wiley, Chichester. Samuels, S. M. (1992). An all-at-once proof of Ignatov's theorem. In Strategies for Sequential Search and Selection in Real Time (Eds., F. T. Bruss, T. S. Ferguson and S. M. Samuels), pp. 231-237. Amer. Math. Soc., Providence. Shorrock, R. W. (1972). On record values and record times. J. Appl. Probab. 9, 316-326, Shorrock, R. W. (1974). On discrete-time extremal processes. Adv. in Appl. Probab. 6, 580-592. Smith, R. L. (1997). Letter to the Editors, on 'Statistics for exceptional athletics records' by Robinson and Tawn, with authors' response. J. Roy. Statist. Soc. C, Appl. Statist. 46, 123-128. Stam, A. J. (1985). Independent Poisson processes generated by record values and inter-record times. Stoeh. Process. Appl. 19, 315-325. Vervaat, W. (1977). On records, maxima and a stochastic difference equation. Report 7702, Mathematisch Instituut, Katholieke Universiteit Nijmegen. Westcott, M. (1977). The random record model. Proc. Roy. Soc. Lond. A 356, 529-547. Westcott, M. (1979). On the tail behavior of record-time distributions in a random record process. Ann. Probab. 7, 868-873. Williams, D. (1973). On R6nyi's 'record' problem and Engel's series. Bull. London Math. Soc. 5, 235-237. Yakymiv, A. L. (1986). Asymptotic properties of the times the states change in a random record process. Theory Probab. Appl. 31, 508-512. Yang, M. (1975). On the distribution of the inter-record times in an increasing population. J. Appl. Probab. 12, 148 154. Yao, Y.-C. (1997). On independence of k-record processes: Ignatov's theorem revisited. Ann. Appl. Probab. 7, 815-821.
D. N. Shanbhag and C. R. Rao, eds., Handbook of Statistics, Vol. 19 2001 ElsevierScienceB.V. All rights reserved.
| |
AIAt.
Stochastic Networks with Product Form Equilibrium
Hans Daduna
1. Introduction This survey describes that part of queueing network theory which to a great extent popularized stochastic networks in mathematics and in a variety of applications. The applications now are, e.g., telecommunications, computer systems and computer networks, productions systems, especially Flexible Manufacturing Systems (FMS) and inventory systems, biological networks, population dynamical systems, social migration systems, etc. All these systems are driven by random inputs, so their evolution over time is described by stochastic processes. A main problem is then to construct the adequate state space description, i.e., from a system theoretical point of view: to find a Markovian description of the system. A large class of Markovian models will be described in the survey. For applications, the most important question in system theory is to find conditions which guarantee that asymptotically in time the system stabilizes, i.e., approaches an equilibrium. Almost all systems dealt with in the survey admit an explicitly given steady state distribution. The equilibrium distribution is, in case of an ergodic process, the limiting distribution as well. It is an experience in stochastic network theory, that in almost all cases where an explicit solution of the steady state problem is obtained, the steady state distribution shows a so-called
product form.
Section 2 describes the early product form results of Jackson and G o r d o n and Newell, which were the starting points of more than fourty years of network theory. In Section 3 these results are embedded into a theory of general vector valued processes with discrete state space, where movements of the customers may be of a rather general form. We describe some recently developed models with general transition mechanisms which admit under some natural restrictions an explicit solution of the steady state problem by providing us with the equilibrium. Concurrent movements of customers pose additional problems, a unified treatment of these problems is not available. Taking into consideration the general principles of locally balanced systems we describe in Section 4 an approach to computing steady states of structured systems. 309
310
H. Daduna
These general processes are Markovian vector processes with interacting components. Considering the time evolution of the process we observe a complicated space-time structure which up to now is not well understood. The spatial interaction of the network's nodes and the time-dependent behaviour of the system results in strong correlations in space and time. For monotone Markov processes there exists a general correlation theory. We introduce stochastic ordering of network processes in Section 5 and develop in Section 6 the correlation theory for the fundamental network processes. It turns out that the internal structure of the network is not well suited for applying directly general correlation theory of monotone processes: we have to look for a balance in the requirements for successfully proving monotonicity and correlation results. In Section 7 we present further models which exhibit product form steady states: with customers of different types and type dependent movements in the network. We consider in such networks individual customers' behaviour. These traveling customers experience on their way through the network the total spacetime structure of the system. It is therefore not surprising that explicit results for passage time distributions are hard to prove. The available results on sojourn and passage time distribution are summarized in Section 8. We end this section with further results on the correlation structure of the networks. In Section 9 we reconsider single station systems, now with a rich internal structure, which opens the possibility of modeling complicated service systems. The development of these models was driven by the need for modeling complex computer systems. Measurements in some of these real systems revealed an insensitivity phenomenon: the steady state distribution for the number of jobs (customers) in system did not depend on the exact shape of the service time distribution but only on the mean service time. This observation together with similar experience from some early queueing models gave rise to the development of an insensitivity theory for a large class of service systems, which encompass very general exponential systems and nonexponential symmetric servers. We describe these results and show that they occur in connection with having explicitly given steady state distributions for the system processes. The connection is clarified by several theorems on the structure of these nodes. In Section 10 we compute the steady state distribution for networks of general exponential systems and nonexponential symmetric servers using routing mechanisms as described in Section 7. We find product form structure for the equilibrium and show that insensitivity holds for the global network process. The survey ends with complements and miscellanea opening a large field of network theory and applications for further reading. We shall use the following notations and definitions: N := {0,1,2,...}, N + : = { 1 , 2 , 3 , . . . } , N are the real numbers, ~ are the Borel sets of N, N+ := I0, ec). For a setE we denote by N(E) the set of all subsets of E. Empty sums are 0, empty products are 1.
Stochastic networks with productform equilibrium

2. Birth-death processes and exponential networks of queues
311
In this section we describe classical queueing network theory. The main theorems which pushed research in the field and laid the ground for the rich class of stochastic network models available today were developed in two stages: first for open networks (1957) and then their closed networks companions (1967). These networks were built of classical exponential multiserver queues which we describe first, their describing processes being birth-death processes. DEFINITION 2.1. (State dependent single server queue) We consider a single server where indistinguishable customers arrive one by one. Customers, finding the server free, enter service immediately, while customers finding the server busy enter the waiting room which has an infinite number of waiting places. Unless otherwise specified, the waiting room is organized according to the First-ComeFirst-Served regime (FCFS): if a service expires, the served customer immediately leaves the system and the customer at the head of the queue, if any, enters service and other customers in line are shifted one step ahead. We always assume that these shifts take zero time. If there are n customers in system, then time until the next arrival passes with intensity 2(n), and (if n > O) the customer in service is served with intensity #(n). Given the number of customers in system, which we henceforth call the queue length, the actual service and interarrival times are independent of the past and independent of another. We denote b y X = (X(t) : t > O) the random queue length process of the system. From the description it follows immediately that for describing by Markov process the evolution of the state dependent single server queue, it suffices to record the queue length, i.e., X defined on a suitable probability space (Q, Y , P ) with state space N is a Markov process. DEFINITION 2.2. (Birth-death processes) Let X = ((Xt : (f2,,~, P ) - - ~ ( N , ~ ( ~ ) ) ) : t E N+) denote a Markov process with right continuous paths having lefthand limits (cadlag paths) and Q-matrix Q = ( q ( m , n ) : m , n E N) given by f 2(m)
[ #(m)
q(m,n)= ~-2(0)
[ - ( 2 ( m ) +/~(m)) Lo
if 0 < m = n - 1 if 0 _< n = m - 1 ifm=n=0

if m = n > 0
otherwise .
Then X is a (one dimensional) birth-death process with birth rates 4(.) and death rates #(.). THEOREM 2.3. The queue length process X of the state dependent single server queue is a birth~death process. If X is ergodic then its unique steady state and limiting distribution is ~ = (~(n) :n c N) with
312
H. Daduna
~/z(i+
1)'
n e N ,
(1)
where G < oc is the norming constant. Conditions for the ergodicity of birth-death processes can be found in Asmussen (1987), Chapter III, 3. The classical exponential queueing systems fit into Definitions 2.1 and 2.2. DEFINITION 2.4. (Exponential multiserver queues) Consider a service station with s > 1 service channels and a linear waiting room of length N under FCFS regime, 0 < N _< oo. Indistinguishable customers arrive according to a Poisson-2 process at the service station and request for an amount of service time which is exponentially distributed with mean #-1. All service times are independent and independent from the arrival stream. If an arriving customer finds free service channels he selects one of them randomly and his service commences immediately. If he finds all channels busy and there is free waiting room he joins the tail of the waiting line, otherwise this customer is lost. If a service expires, the customer at the head of the line immediately enters the free service channel and the other customers in the waiting line move one step ahead. These movements take zero time. Let X(t) denote the number of customers in system at time t, t > 0. We call X(t) the queue length, including customers waiting and in service. Then X = ((Xt: ( O , ~ , P ) ---+ ( N , ~ ( N ) ) ) : t E ~+) is a Markov process with Q-matrix Q = (q(m, n) : m, n c N) given by 2#min(m, s) if O < m = n - l < N + s if O < _ n = m - l < N + s if m = n = N + s if m = n = O if O < m = n < N + s otherwise .
q(m,n) = / - ~ m i n ( m , s )
[ O(2 + #min(m,s))
In case s = oo (with N = 0) X describes the evolution of the infinite server queue, for s < oo, N = cx~ we have the M / M / s / o o - FCFS system which was investigated already by Erlang around 1910 - see Brockmeyer et al. (1948), Ergodicity of the queue length process always holds for the cases s = ec or N < eo, otherwise 2 </~s is necessary and sufficient for ergodicity. Unless otherwise specified our systems will always have infinite waiting room. Erlang was a teletraffic engineer and developed his queueing formulas for predicting congestion of telephone networks. His formulas were applied in planning switching centers in teletraffic networks, the centers were modeled by M / M / s / N systems in isolation. Modeling networks and successfully computing performance measures directly for the network model was not possible at that time. Instead, decomposition techniques were applied. The principles of these
Stochastic networks with product form equilibrium
313
techniques when applied to networks are studied in depth in Agraval (1985). This application is common practice in cases of networks where the steady state behaviour is not known or which do not have an explicit simple to evaluate performance characteristics. The first breakthroughs with respect to successfully investigating queueing networks with general topology were the works of Jackson (1957) and G o r d o n and Newell (1967). DEFINITION 2.5. (Jackson network) A Jackson network is a network of service stations (nodes) numbered { 1 , 2 , . . . , J } =: J. Station j is a multiserver with sj > 1 service channels and an infinite waiting room under FCFS regime (see Definition 2.4 for details). Customers in the network are indistinguishable. At node j there is an external Poisson-2j arrival stream, 2j >_ 0, and customers arriving at node j from the outside or from inside of the network request for an amount of service time there which is exponentially distributed with mean #j > 0. All service times and interarrival times constitute an independent family of random variables. The movements of the customers in the network are governed by a Markovian routing mechanism: A customer on leaving node i selects with probability r(i,j) >_ 0 to visit node j next, and then enters node j immediately, commencing service if he finds an idle channel there, otherwise he joins the tail of the queue of node j; with probability r(i, 0) _> 0 this customer decides to leave the network immediately (~J=0 r(i,j) = 1 for all i c J.) Given the departure node i the customer's routing decision is made independently of the network's history. To exclude trivialities, we assume that with 2 := ~ j J 1 2j and r(O,j) := 2j/)~ the matrix R := (r(i,j) : i , j = O, 1 , . . . ,J) is irreducible. For describing the network's evolution over time we use the following processes: Let Xj(t) denote the number of customers present at node j at time t _> 0, either waiting or in service (local queue length at node j), then X ( t ) := (Xj(t) : j = 1 , . . . , J) is the joint queue length vector of the network at time t. We denote by X = (X(t) : t _> 0) the joint queue length process of the Jackson network. THEOREM 2.6. (Jackson, 1957) Let E---- NJ denote the state space of the joint queue length process X = ((Xt: (~?, , P) ----+ (E, ~ ( E ) ) ) : t E ~+) of the Jackson network defined above. Then X is a Markov process with Q-matrix Q = (q(x,y) : y , x c N J) given by: For i , j E J, and x = ( n l , . . . , n j ) C E q(nl,... ,hi,... ,nj;nl,... ,ni + 1 , . . . ,n j) = )~i
q ( n l , . . . , n i , . . . , rtj; r t l , . . . , ni -- 1 , . . . , n j)
= #i min(ni, si)r(i, 0)
if ni > 0 ,ni - 1 , . . . ,nj + 1,... ,n j )
q(nl,... ,ni,... ,nj,... ,nj;nl,... = #imin(ni,si)r(i,j)
if ni > 0 ,
314 and
H. Daduna
q(x,x) = q(x,y) = 0
Z
ycE-{x}
q(x,y),
otherwise .
X is irreducible, conservative, and nonexplosive. The traffic equation of the network J r/j = 2j + Z r/ir(i,j), i=1
j = 1,... ,J ,
(2)
has a unique solution which we denote by r/= (r/l,..., r/j). X is ergodic if and only if r/j < i~jsj, j = 1,... ,J, holds. The unique stationary and limiting distribution ~ on E of X is given as follows. Let ~zj denote the local probability for node j, { , ~J i - 1 (~j'~'J 1' ~.g,J V - - 1 \r,jJ ~ i f n s_<@
if nj >_sj ,
where Kj is the norming constant. Then ~c is of product form J 7C(nl,... , nJ) ~-- H 7Zj(l'lj), j=l (1"/1,''', n j ) C E . (3)
REMARK 2.7. (Product form equilibrium)The local queue length process Xj = (Xj(t) : t _> 0) at n o d e j has equilibrium ~j, the jth marginal of re, and ~ is the product of its marginals. Therefore in equilibrium the local queue lengths at a fixed time point behave as if they are independent. Highlighting this independence, Tc is said to be of product form (external product form over the nodes). (For further notions of (internal) product forms see the discussion preceding Definition 9.10.) This finding was the starting point for the successful investigation of queueing networks. It should be stressed that the processes Xj,j = 1 , . . . , J, are by no means independent. They strongly interact, the interaction forces are carried by the moving customers. What Jackson proved is: The one dimensional marginals in time of the vector valued stationary Jackson network process have independent coordinates in space. Despite all the results on product form networks which are known today, knowledge of the space-time structure of even the simpest networks in equilibrium is rare. The form of the local equilibria rcj,j =- 1,... ,J, suggests that in steady state node j develops like an M / M / s H o c - F C F S system according to Definition 2.4 and Theorem 2.3 in isolation with customers arriving in a Poisson-r/j stream. But it can be proved that in general the customer flows in the network are not Poissonian (see Melamed, 1979). A survey on customer flow processes is Disney and K6nig (1985), further details are given in Disney and Kiessler (1987).
315
Almost nothing is known with respect to non stationary network processes. It is well known that even for the single server FCFS M / M / 1 / o c queue the time dependent behaviour has a complicated structure: the transition probabilities are expressed using Bessel functions of the first kind, see Cohen (1982) and Parthasarathy (1987). The space-time correlation structure of network processes will be even more involved and needs much further research effort before we shall be able to provide qualitative or even quantitative results. A remarkable exception of this is the following theorem of G o o d m a n and Massey. They investigated non stationary network processes successfully, yielding easy to use rules for classifying the behaviour of the local node processes in non ergodic network processes. The theorem is interesting because it shows that even in non ergodic networks some nodes may stabilize over long times: For network processes we therefore may observe spatially for some nodes asymptotically diverging local queue lengths while other nodes asymptotically approaching a local limiting distribution. The technical problems that arise are due to the local queue length processes being non-Markovian. THEOREM 2.8. (Goodman and Massey, 1984) Consider a Jackson network according to Definition 2.5 with only single server nodes, i.e., we have sj = 1 for all j = 1 , . . . , J. Then the traffic equation
J
t/j= 2j+ Z(t/iA#i)r(i,j),

i=1
j=
1,...,J
has a unique solution t / = (t/1 , . most J steps. If g = {j E J : t/j < #j} then
, t/j). t/can be determined algorithmically in at
lim P(Xj(t) = nj : j E U) = 1 - [ If j ~ U then

I~4C~3
1-t/j
t/J
nj E N .
limP(Xj(t)=nj)=0,
for a l l n j E N
If we consider closed queueing networks where a fixed number of customers is cycling then independence for the marginal local queue length processes is not possible because obviously some negative correlation must hold between these processes. It lasts for ten years until the analogue of Jackson's theorem was proved by G o r d o n and Newell. DEFINITION 2.9. (Gordon-Newell J = { 1 , . . . , J } as described in the external arrivals. There are I > 0 Markov matrix R = (r(i,j) : i , j = network) Consider the set of multiserver nodes Definition 2.5 of the Jackson network without customers cycling according to an irreducible 1,... ,J). The service times are similar to the
316
H. Daduna
open network case and the independence assumptions on service times and routing decisions are assumed to hold as well. Let Xj(t) denote the number of customers present at node j at time t > 0, either waiting or in service (local queue length at node j). Then X ( t ) : = (Xj(t) : j = 1,... ,J) is the joint queue length vector of the network at time t. We denote by X = (X(t) : t _> 0) the joint queue length process of the Gordon-Newell network. THEOREM 2.10. (Gordon and Newell, 1967) Let S ( I , J ) = { ( n l , . . . , n j ) E N J : nl + + nj = I} denote the state space of the joint queue length process X = (Xt: (, ~ , P ) ---+ (S(I,J), ga(S(I,J))) : t E N+) of the Gordon-Newell network defined above. Then X is a Markov process with Q-matrix Q = (q(x,y) : y , x s ( I , J ) ) given by: For i,j E J, and x = ( n l , . . . ,nj) E S(I,J)
q(nl,...,ni,...,nj,...,nj;nl,...,ni= I~imin(ni, si)r(i,j )

and if ni > O,
1 , . . . , n j + 1 , . . . , n j)
q(x,x) = -
~ q(x,y), y~S(Z,J)-{x}
q(x,y) = 0 otherwise .
X is irreducible, conservative, nonexplosive, and ergodic. Let r / = (r/1,.-.,r/J) denote the unique probability solution of the traffic equation r / = tie . The unique stationary and limiting distribution Tc= ~c(I,J) of X on S(I, J) is (4)
(nl,,..,nj)
w h e r e 4 j ( n ) = n, if _< "i a n d = s i, if _> s i, a n d
E S(1,J)
(5)
is the
norming
constant. REMARK 2.11. (Product form equilibrium) The steady state distribution ~(I,J) of the Gordon-Newell network process is said to be of product form as well, characterising the internal structure of the probabilities, which have non normalized factors for the different nodes appearing in the Jackson network equilibrium. The result (5) is surprising: The Gordon-Newell network equilibrium looks like being obtained from an equilibrium in a Jackson network with the same nodes and suitably redefined routing, by conditioning on the fixed number of customers present. This has structural implications for the probability ~(I,J): It is negatively associated - see Daduna and Szekli (1996) and
317
Theorem 6.8 below. The local queue length processes Xj = (Xj(t):t >_ 0), j = l , . . . , J, in equilibrium are strongly negative correlated for a fixed time point. The remarks on the space-time correlations in the Jacksonian network apply here as well.
3. Vector-valued birth-death processes and generalized migration processes

We consider a continuous time M a r k o v process X = (Xt : t > 0) defined on a probability space (~2, Y , P ) with state space E C N J where J is fixed. We denote by Q = (q(x,y) : x,y E E) the Q-matrix (intensity matrix) of X. For x y q(x,y) is called the transition rate from x to y. ASSUMPTION 3.1. For X we assume throughout that its path are right continuous with lefthand limits (cadlag paths), that Q is conservative (~yEe-{x) q ( x , y ) = - - q ( x , x ) , x E E), and non-explosive (having only a finite number of jumps in any finite time interval). Unless otherwise specified we assume that X is irreducible on E. DEFINITION 3.2. (Net migration processes) For x,y E E with q ( x , y ) > 0 let (yj - xj)+ = aj and (xj - yj)+ = dj denote the net increment, resp., decrement, of this transition in the jth coordinate. We call a = ( a l , . . . , a j) the net increment vector and d = (dl,. , dj) the net decrement vector. Let F C N J x N J denote the set of feasible pairs of joint increment and decrement vectors, i.e., the net transition pairs for X. Then the transition rates of X are of the form
q(x,y)=A(x,a,d),
fory=x-d+a,
x, y C E
withq(x,y)>O
(6)
and we have F -- {(a, d) : A(x, a, d) > 0 for some x E E} for the set of feasible net movements of X. Note that defining Q from a prescribed function A on the set of feasible transition pairs (x,y) E E 2 is not a restriction because x and y determine the net increment and net decrement vector uniquely. Introducing the description (6) for the transition rates is often useful for clarifying the movements of customers in networks or of individual items between compartments. Much research effort has been dedicated to finding specific functions A, that are well suited for describing real problems and which admit computing performance measures of the system. The most important singular performance quantity is, in almost all cases, the steady state distribution. The existence of a steady state for the describing process guarantees that the system stabilizes over time. Furthermore, it turned out that in m a n y situations from the steady state distribution various other performance measures can be derived immediately. See Lavenberg (1983) or K a n t (1993) for algorithms to evaluate performance measures for product form networks of queues.
318
H. Daduna
For a description of these and similar systems and various examples see, e.g., Serfozo (1992), Equation (13), and the examples thereafter, Serfozo (1993), and Boucherie and van Dijk (1990), Remark 2.1., Boucherie and van Dijk, 1991; Henderson and Taylor, 1990, 1991b; Miyazawa, 1997 and the references therein. EXAMPLE 3.3. (Exponential single movement networks) Gordon-Newell and Jackson network processes can be described using the formalism of net transition rates. Let for the multiserver node j E { 1 , . . . , J } with sj _> 1 service channels denote
@(nj) =
nj sj
if rtj < sj, ifnj>_sj .
Let ej denote thejth unit vector of dimension J, having jth coordinate equal to 1 and zero elsewhere, and 0 is the vector with all coordinates zero. (1) For the Jackson network we have for ( n l , . . . , n j ) C NJ A ( ( n l , . . . , n j ) , el, ej) = 12fPi(ni)r(i,j), and
A ( ( n l ~ . . . ~ny), ei~ O) = ]~i~)i(l~i)r(i~ 0)~ A((nl,...,nj), 0, ej) ~- .~j, j E 07 . if ni > O, i E if,
if ni > O, i,j E ], (7)
(2) For the Gordon-Newell network we have for ( n l , . . . , n j) E S ( I , J ) A((nl,... , n j), el, ej) = #i~i(ni)F(i,j), if n~- > 0, i,j ~ J , (8)
(3) The form of A(.) in (7) and (8) suggests the definition of open and closed networks of single server queues with state dependent service rates similar to Definition 2.1: There is always at most one customer in service who is served at node j with rate Ftj(nj) = ]zj~)j(nj), ~)i(') is interpreted as a capacity function ~bj.: { 1 , . . . , I ) ----+ ~+ - {0} for Gordon-Newell networks, and ~bj.: ~+ ~+ {0} for Jackson networks, Oj(O) = O. Depending on the local queue length it speeds up or slows down the rate at which the residual requested service time of the customer in service is diminished. This yields for the G o r d o ~ N e w e l l network
-
A((nl,...,nj),ei,
ej) = #i(ni)r(i,j),
if ni > O, i,j E J ,
(9)
and similar general rates for the open network case. Replacing in Definition 2.9 the multiserver nodes by single server nodes with state dependent service rates (9), the joint queue length process of the (generalized) Gordon -Newell network is ergodic with unique steady state r c ( I , J ) ( n l , . . . , nd) = G ( I , J ) - I H
j-q k=l ~k#J(k)J '
tlJ
( n l ' " " ,nj) C S ( I , J )
(10)
where q is the stochastic solution of the traffic equation (4).
319
For Jackson networks with this general capacity function incorporated into Definition 2.5, and with t/being the solution of the traffic equation (2), we obtain for an ergodic joint queue length process the steady state distribution
~(nl,...,nj) = K(j)-l l-Ii-[

j=l k=l
rb
(nl,...,ng) e Ng
In Jackson and Gordon-Newell networks the interaction of the nodes in space and time are carried by the moving customers. The spatial interaction is thus driven mainly by the random routing matrix R = (r(i,j) : i,j = 1,... ,J), A much more complicated behaviour of the system emerges if the interaction of the nodes is driven by the capacity function as well. It turns out that in general for this case the steady state distributions are not known. A detailed investigation of such systems can be found in Serfozo (1993). We describe here the closed network case. EXAMPLE 3.4. (Serfozo, 1993) Consider the Gordon-Newell network of Definition 2.9 with single server nodes which serve customers at rates which depend on the global state of the network. The Q-matrix Q = (q(x,y) : y , x E S(I,d)) is for i,j E J, and x = ( h i , . . . ,l'lj) e S(I,J):
q(x,x - ei + ej) = PiOi(x)r(i,j),

and
if ni > O,
q(x,x) = -
q(x,y),
yeS(I,J)-{x}
q(x,y) = 0 otherwise.
Here r(i,j) are routing probabilities which are assumed to depend on the departure and destination node only, Pi is the locally determined service rate and
(oi : S(I, J) ---+ ~+ ,

is the globally determined capacity function for node i. We assume Oi(n~,..., n i , . . . , na) > 0, if and only if n; > 0. The routing matrix is assumed to be irredicible, with ~/= O h , . . . , ~/J) the probability solution of the traffic equation (4). The aim is to find conditions on the family {~bj,j = 1 , . . . , J } , such that a unique stationary and limiting distribution 7t = ~(I,J) of x on s(I,J) is of the form
=(',J)(/'/1,..., f/j) :
G(I,J) -14"(/'/1,..., nj) ~i ~ 1 k/2j ]
,
(11)
(nl,...,nj) ES(I,J) ,
where G(I, J) is the norming constant and
4, : S(I, J) ---+ ~+
320
H. Daduna
a function which is constructed from the @. ~(.) has to be chosen such that in case of capacity functions (8), (9), resp., we get
s nj[" l " k
.
Structural conditions that lead to an equilibrium (11) are typically as follows: THEOREM 3.5. (Serfozo, 1993; p, 149, 155) Consider the generalized G o r d o n Newell network of Example 3.4 with globally dependent service capacities dpi : S ( I , J ) --+ R+. For f E { 1 , . . . , J } and m E { 1 , , . . , I } let
S f ( I , m ) = { ( n i , . . . , n j ) E S ( I , J ) : ni = I - m )
the subset of states where only m customers reside outside of node f . (In particular we have x E Sf(I,O), if and only i f x = I . el.) Set q$i(x) = 0 i f x ~ S ( I , J ) . Then the following statements are equivalent: (a) There exists some f E { 1 , . . . , J } such that
(aj(x) . (ok(x - ej + es) . @ ( x - ek + el) = ~?k(x). (ol(x - ek + e l ) . @ ( x - ei + el-) ,
(12)
for allx C S ( I , J ) , j , k E {1,... ,J}, holds. (b) (a) with (12) holds for a n y f E {1,... ,J}. (c) There exists an f E { 1 , . . . , J } such that for all x c S ( I , J ) the following holds: If x E S f ( I , m ) and x ( O ) , x ( 1 ) , . . . , x ( m ) E S ( I , J ) is a sequence such that x(h) E S f ( I , h ) , h = 0, 1 , . . . , m , x =x(m), and for h = 1 , . . . , m : x(h) = x ( h - 1)el + ej(h) for some j(h) E { 1 , . . . , J } , then the product
m
(x) = I-I Os(x( h h--1
-1
(13)
is independent of the specific sequence x(0),x(1),... ,x(m). (Note that an empty product is 1.) (d) The statement of (c) holds for any f E { 1 , . . . , J } . (e) The function ~ : S ( I , J ) - - + ~ + defined recursively by ~b(x)=l, X C S f ( I ; O) and
Cb(x) = eb(x - e+ + es) . @ ( x - e+ + el)- 4bj(x)-1 ,
x s(s,J) - &(s,0) , (14)
is well defined, i.e., independent o f f and the coordinates j chosen in (14). If any of these conditions is fulfilled, then the equilibrium distribution of the network process X on S ( I , J ) has the form (11) with ~b(.) given by (13).
321
If Oj(x) = 4)](nj) for all j, i.e., the capacity functions are only locally dependent, then the equilibrium is (10) with #j(nj) = #jOj(nj) according to (9) and (8). The restrictions imposed by (12) on the capacity functions seem to be strong. But they generalize in the right direction the locally dependent capacity functions. Serfozo (1989a, 1993) and K o o k and Serfozo (1993) gave examples of practical relevance which meet (12) and which do not fit into the family of network processes showing the classical product form equilibrium (10). Note that the celebrated BCMP formalism (Baskett et al., 1975) leads in the closed network case to equilibrium (10). The networks of Section 2 and in the preceding examples all share the property that with probability 1 only one customer jumps at a time instant. This is a consequence of the exponential service time requests of the customers. During about the last ten years several models of networks with concurrent movements of customers are developed. The associated network processes are constructed using the standard procedures for discrete state Markov processes in continuous time: Given a set of state dependent exponential holding times and a one-step transition kernel we define pathwise the network's evolution (see e.g., Miyazawa (1997), Section 3) or we compute the transition probability function directly from the Q-matrix as the minimal solution of the backward equation. Due to only one customer moving at a time in the previous examples the transitions which occur are defined directly as jumps in net migration processes. In connection with concurrent movements, batch services, and group arrivals, we shall observe increment and decrement vectors which are different from the net increment, net decrement vectors, respectively. DEFINITION 3.6. (Generalized migration and generalized birth-death process) Let X have transition rates (6). Then X is called a generalized migration process. Let X have transition rates
q(x,y)=
A(x,a,O)=fl(x,a) A(x,O,d)=6(x,d)
ifx, y = x + a E E ifx,y=x-dEE
otherwise ,
forsome forsome
a~A, d~D,
(15)
where A,D c_ NJ are the sets of birth, death vectors, resp., for individuals, and f = (A {0}) @ ({0} x D). /~(.) are the birth rates, 6(-) the death rates. In such processes births do not occur simultaneously with deaths and vice versa, but births in several coordinates may occur simultaneously, and deaths as well. A Markov process X with transition rates (15) is called generalized birth~teath process. REMARK 3.7. Serfozo (1992) reserves the term compound migration and compound birth-death process for reversible Markov processes with transition rates (6), (15), resp.. We do not use this convention here because migration processes usually are not reversible in time - see Kelly, 1979, Chapters 2 and 6. It follows
322
H. Daduna
that every M a r k o v process which fullfills the Assumption 3.1 is a generalized migration process.
4. Migration processes with concurrent movements and an application of local balance principles
Using net transition rates for describing the processes' evolution is the easiest description on the abstract level. This concept and considering indistinguishable customers possibly hides in the process description specific activities in the real system. E.g., the service discipline of the nodes is of no importance for the evolution of the joint queue length vector - if n > 0 customers are present, we either m a y serve only one of the customer at rate n#, or we may serve all n customers concurrently, each with rate/z; both disciplines yield the same distribution of the time until the next service expires. Similar problems occur in case of concurrent movements, group arrivals and batch services. The theory of these systems is still in a premature state, a unified description of alt the existing suggestions of how to deal with concurrency seems to be not possible. In the following I shall describe in some detail a specific, but rather general example of how to construct successfully stochastic processes for modeling these features. EXAMPLE 4.1. (Migration processes with state dependent multiple transitions, Boucherie and van Dijk (1991)) We consider a set J = {1,... , J } of nodes (with an unspecified service regime) and indistinguishable units cycling in the network. The network may be open or closed with state space E C N J, which is composed of the set of possible population vectors in the system. Such states are called for short the joint queue length vectors of the network. The units' migration and behaviour in the nodes is specified by general transition mechanisms. We assume that the time evolution of the network can be described by a M a r k o v process X = ((Xt : ( f 2 , ~ , P ) ~ ( E , Y ) ) : t E ~+), which is irreducible on E and ergodic with stationary distribution ~ on (E, 5~). The Q-matrix Q = (q(x,y) : x , y ~ E) of X is assumed to have a bounded diagonal, hence X is uniformizable. Transitions between different states x , y are triggered by a departure vector d = ( d l , . . . , d j ) and an arrival vector a = ( a l , . . . , aj) as follows: dj <_xj units depart from node j, these and possibly some further new arrivals from the outside are to be distributed over the network or leave the network; according to these routing decisions immediately thereafter @ < yj units arrive at node j, and all these movements take zero time. Note that a and d are not necessarily net transition vectors. Therefore several different pairs of departure and arrival vectors m a y trigger the same net transition between x and y. For a specific quadrupel x,y, d, a, we define w E N J the vector of customers that stay on at their nodes. We therefore have for this transition a unique decomposition x = w + d, y = w + a. If the network is closed with population size I we have ~ j J 1 dj = 2 j J1 aj < I, if the network is open, customers may depart from or newly arrive at the network.
323
~jJ--1 dj > ~ j J = l aj indicates net departures excess, and ~ j J = l dJ < ~ j J = l aj indicates net arrival excess. Because different tripel (d, a; w) may trigger the same transition pair (which is then uniquely defined) we obtain
q(x,y)=
where
~
d,a,w: w+d=x, w+a=y
q(d,a;w) x,yCE, x~=y ,
(16)
q(d, a; w) is
the rate at which these vectors (d, a; w) occur.
The main problem is to find conditions on the family {q(d, a; w)} such that the stationary distribution 7t of X on E is explicitly computable. It is then often of a form resembling (11) as can be seen from the examples e.g. in Boucherie and van Dijk (1991), Henderson and Taylor (1990), Miyazawa (1997). The key property obtained in Boucherie and van Dijk (1991) for this to hold is Group-Local-Balance (GLB). DEWNmON 4.2. (Group-Local-Balance)Consider the migration process of Example 4.1. A distribution p on an irreducible state space E satisfies GLB for the Q-matrix Q, if for all w and d such that w + d E E
p(w + d) ~ q(d, a; w) = ~ p(w + a)q(a, d; w)

a~d aTZd
(17)
holds. The process X is said to satisfy GLB or has the GLB-property if some distribution p satisfies GLB for the transition rates of X. COROLLARY 4.3. Let for the migration process of Example 4.1 exist a distribution p on the irreducible state space E which satisfies GLB. Then p is the steady state distribution of X. PROOF. Because the proof of the corollary motivates the definition of GLB and clarifies the relation between local and global behaviour of the process which is exploited by GLB, it is sketched here: 7r is the stationary distribution of X if and only if it is the (unique) probability solution of the global balance equations
re(x) ~
Zq(d'a;w)=
ZTz(wa)q(a'd;w)' x c E .
w,d:w+d=x aT~d
w,d:w+d~x aT~d
(18)
Summing (17) over all pairs (w, d) with w + d = x shows that p with GLB fulfills (18) as well. F r o m the uniqueness of the stationary distribution we obtain rc = p. GLB is a generalization of notions like local balance or job local balance, see the remarks and references in Boucherie and van Dijk (1991). All these notions have a common interpretation of balancing detailed or local probability fluxes in the system. GLB especially imposes rules on the local behaviour of X and Q with respect to the fixed stay-on vectors w, thus introducing state dependent control of local probability fluxes. Although such locality is in general rare, experience of stochastic network theory indicates that explicit accessibility of z in most cases
324
H. Daduna
parallels the emergence of similar local structures. A theoretical confirmation of this experience is described in Theorem 9.13. From Corollary 4.3 it follows that looking for such local behaviour is a valuable task. The difficulty with GLB and its relatives is that for checking the definition we need already (up to a norming constant) a guess of the steady state probabilities. It is therefore useful to study for fixed stay-on vector w the local structure of q(.,-; w). For each fixed w such that there exist d, a with q(d, a; w) > 0 the function q(., .; w) defines in a natural way transition rates for a Markov process on some state space E(w) C_E,E(w) !3. E(w) is for q(., .; w) not necessarily irreducible and for different w the sets E(w) are usually not disjoint. But under GLB it can be shown that a partition
k(w)
E(w) =
i=1
Ei(w)
(19)
exists such that the Ei(w) are irreducible for q(., .; w). Therefore for each w and i E { 1 , . . . , k ( w ) } we may view q(.,-;w) as defining a Q-matrix for a Markov process on state space loci(w) = {d ~ N~': w + d C Si(w)} These local processes have for all w and i = 1 , . . . , k(w), under GLB strict positive solutions xi(w) = (xi(d; w) : d ~ loci(w)), of the equations
xi(d;w)
Z
a~d
aeloci (w) - {d}
q(d,a;w) =
Z
a#d
a[oci (w) - {d}
xi(a;w)q(a,d;w) .
(20)
With p from (17) and 7c from (18) we then want to show

xi(d; w) : p ( w ~- d) . c 1 : ~ ( w ~!_ d) . b -1
(21)
with suitable positive constants c, b. This would free the xi from the dependence on i. Solving (20) is a local computation in the neighbourhood of w, which is often easy to perform. An approach to solving the steady state problem for the migration process or network process of Example 4.1 is considering the local behaviour of the system via (20) in detail. First we solve the different sets of equilibrium equations for the locally around w evolving Markov processes on the subspaces loci(w). This usually can be done without much effort because of the detailed service mechanism and routing behaviour which is described by q(., .; w). Especially, identifying the local state spaces loci(w) is direct from Q. If there occurs s o m e xi(d; w) = 0, the GLB property cannot hold. We therefore assume henceforth (see Boucherie and van Dijk, 1991). ASSUMPTION 4.4. For any w E NJ for which E(w) 13, partition (19) of E(w) holds and for i c { 1 , . . . , k ( w ) } the local systems (20) have a solution xi(w) = (xi(d; w): d E loci(w)), which is uniquely determined up to a constant.
325
If this is satisfied, it remains to check, whether Xi(W) satisfies (21), especially being, up to a constant, independent of i. This can be done by checking a criterion similar to the Kolmogorov criterion for reversibility of Markov processes, see Kelly (1979), Theorem 1.8. THEOREM 4.5. (Boucherie and van Dijk (1991), Corollary 2.18) Consider the migration process of Example 4.1 which fulfills the Assumption 4.4. A distribution ~ on E satisfies GLB if and only if there exists a reference state x0 E E, such that for all x c E, for all p >_ 0, and for all finite paths x0 = w0 + do ---+w0 + a0 = wl + d l ---+wl + al = w2 + d2 ~ - "
= wp + dp ~ wp + ap = X ,
and all ik c N, k = 1 , . . . , p , the products

to(x) = f i xi,~(ak; wk)
xik(dk; wk)
(22)
are independent of the selected path, i.e., ~: is a well defined function. Then ? ~c(x) = c -~ re(x), x E E. If (22) holds for some x0, it holds with using any x 0 E E as reference state. Note that for a given transition wk +dk ~ wk + a k the set loci(wk) with dk,ak ~ loci(wk), such that wk + dk, wk + ak E Ei(wk) holds, is uniquely defined. But it is possible that for w w' we find w~ + ak E E(w) and Wk+l + dk+l E E ( W ) . Note further, that due to the Assumption 4.4 the quotients in (22) do not depend on i, the different constants cancel. Boucherie and van Dijk (1991) introduced a c~-process associated with a process satisfying Assumption 4.4. The transition rates of the a-process are defined by quotients which are just the factors on the RHS of (22). Then they prove that GLB for a process is equivalent for the a-process to fulfill a locally defined strong reversibility criterion (Theorem 2.16 in B oucherie and van Dijk, 1991). As Boucherie and van Dijk (1991) point out, the procedure for computing the equilibrium of X can often be shortened because solving the local equations (20) in many cases yields a guess for the solution p of the GLB-equations, This directly yields the steady state of X according to Corollary 4.3. Even if this is not the case, checking the Kolmogorov criterion (22) in many systems has to be done only for some specific paths which define, e.g., cycles in the state space. In many migration systems or queueing networks the detailed transition rates q(., .; w) follow the specific form
q(d, a; w) = f ( d , w + d)p(d, a; w) .
(23)
f ( d , w + d) is the intensity that in state w + d E E the batch d of individuals decide to migrate, or that the batch d of customers is released from their nodes because their service expired, p(d, a; w) is the state dependent routing characteristic which in many systems is a state dependent transition probability. In the
326
H. Daduna
literature many transition rules similar to (23) are discussed. A main point is the possibility of obtaining a steady state counting density which can be decomposed similarily to (23) into a term concerning the intensity function f ( . , .) only and a term concerning only the routing characteristics p(., .; .). Classical examples of this decomposition are (3), (5), and more recently (11). For the latter probabilities the decomposition reads for ( n l , . . . , nj) E S(I,J): ~c(I,J)(nl,... ,nj) = G([,J) -1 e ( n l , . . . ,nj)
j=l \ J /
I~j nj
These and similar forms are wellknown from the literature, see, e.g., Boucherie and van Dijk (1990a), Henderson and Taylor (1990) or Miyazawa (1994, 1995), and the references there. Boucherie and van Dijk (1991), Section 3, discuss especially the case
j'(d,x) - ~(x - d)
(x)
'
(24)
where the real valued function ~b fulfills ~(x) > 0 for all x E E, while T(x - d) is non negative and may be 0 for some values. In case of rates (23), we may substitute in the balance equations (20) for the local processes
yi(d; w) := xi(d; w)f(d, w + d),
d E loci(w) ,
and reduce investigation of the migration process to solving local equations for the routing process only. This is often much easier due to simple routing policies. THEOREM 4.6. Consider the migration process of Example 4.1 with detailed rates (23) and f satisfying (24). Let yi(w) = (Yi(d; w): d E loci(w)), be the solution of the routing equations
yi(d;w)
Z
aTLd a~loci(w)-{d }
p(d,a;w) =
yi(a;w)p(a,d;w) ,
(25)
aT~d aEloci(w)-{d}
which are assumed to be strictly positive and unique up to a constant, which may depend on i. Assume further that there exists a Markov process Y with state space E and detailed transition rates qy (similar to (16)) which fulfill: For any w, i = 1 , . . . , k(w) and w + d, w + a El(W)
qy(d, a; w) _ yi(a; w) qy(a, d; w) yi(d; w) '

and qy(d, a; w) = 0 otherwise. (Note that the quotients do not depend on i.) Then the migration process X satisfies GLB with stationary distribution ~ if and only if the stationary distribution ~y of Y satisfies for all E(w) 0 and all
{1,...,k(w)}
~zy(d + w)qy(d,a;w) = rcy(a + w)qy(a,d;w), w + d, w + a E Ei(w) .
Stochastic networkswithproductform equilibrium

(Y is then called strongly reversible at E.) It follows with ~b from (24) x e.
327
(26)
Elaborating further on (26) is usually possible, especially with respect to Try.Due to the strong reversibility of Y a product decomposition of roy via a Kolmogorov criterion similar to (22) in Theorem 4.5 can be constructed. But in many cases the solution ~ry of the steady state equation for Y can be found directly. This is the center of almost all the proofs in the references mentioned above. Let us consider the case of state independent routing
p(d, a;w) =: p(d, a),
a,d C loci(w) for some w .
Then the solution of (25) reads yi(d; w) =: y(d) for all w, d such that w + d c E. If there exists a function F : E > R+ - {0} such that
r ( w + d) _ y(d)
C(w + a)
y(a)
holds, then zcy(x) = F(x), x E E (Boucherie and van Dijk (1991), Example 3.31). Decomposition (26) then is = e-le(x). E,
which was already obtained in Henderson and Taylor (1990). A detailed recent study of processes in discrete and continuous time similar to those defined in Example 4.1 is due to Miyazawa (1997). His results for continuous time processes are parallel to that of Boucherie and van Dijk (1991). He introduced a supplementary variable into the state space description of the process by including the departure vector d (see Example 4.1) into the system's state. For this Markov process he invented the notion of structure-reversibility, which is a property obtained by considering the time reversed process of X: the time reversal has to show a similar structure as the original process. This property resembles the strong (local) reversibility sketched above and has consequences similar to GLB.
5. Stochastic ordering and stochastic comparison for network processes

We describe in this section monotonicity properties of network processes and generalized migration processes. As pointed out in Remark 3.5 any Markov process with state space E c_ N J can be considered as generalized migration process, so this section is on general theory for monotone Markov processes with discrete multidimensional state space. An introduction into fundamental principles of monotone Markov processes is (Liggett, 1985), Chapter II.2, or with respect to discrete state spaces (Massey, 1987). For defining monotonicity the state space E of the process has to be equipped with an order structure, which will
328
H. Daduna
be in our models not necessarily the natural product ordering derived from the standard ordering on N. ASSUMPTION 5.1. (Ordered state spaces) The state space (E, 5P) of the process is a normally ordered polish space, i.e., E is equipped with the Borel-o--algebra 5p and with a partial order -< on E, such that the following holds: for every pair of compact sets /1, K2 C E, such that F~ = {x c E : x -< y for some y c K1 } and F2 = {x E E : y -~ x for some y E K2} are disjoint, there exists an increasing continuous function f : E --+ [0, 1] w i t h f ( x ) = 0 for allx E F1, f(x) = 1 for allx E F2, see Lindqvist (1988). The partial order -< on E is assumed to be closed, i.e., the graph {(x,y) E E 2 :x -< y} is a closed set. I f E carries some algebraic structure the topology and ordering is compatible with the algebraic operations. [For details see Nachbin (1965) and Lindqvist (1988).] For the order -< on E an increasing set A C E is defined by [x-<y/~xcA]~ycA, and a decreasing set B C E is defined by Ix -< y A y E B 1 ~ x E B. Let I~ _C Y denote the set of increasing measurable sets, D~ C_ 5f the set of decreasing measurable sets, and I~ the set of continuous bounded real value functions which are isotone with respect to -< and the standard order on N. On J / = J//(E, 50), the set of probability measures on (E, 50), the strong stochastic ordering -<~t is defined by
P, Q E ~ / : P - < s t Q c = : ~ f f d P < _ f f d Q
VfEI~
(27)
The state spaces of the network processes equipped with the product ordering fulfill Assumption 5.1. For this generic product order on N J we write _ and _<~t. In the following we consider a Markov process X = ((Xt: (f2, J , P ) (E, 50)) : t C N+) with state space E _ N J, 5p the set of all subsets o r E , fulfilling Assumption 3.1 with standard transition semigroup P = (Pt : t _> 0) and intensity matrix Q. For # E ~# and f c I~ we define #Pt(A) =
[#(dx)Pt(x;A), JE
A E 5e, and
Ptf(x)=- [Pt(x;dy)f(y), x E E . JE
DEFINITION 5.2. (Stochastic monotone processes) X is -<-monotone if and only if for #, v E ~ , # -<~t v, it follows #Pt -<st vPt for all t _> 0. The standard characterisation of monotonicity is (see e.g., Stoyan (1983), Chapter 4) LEMMA 5.3. X is -<-monotone if and only if for all f C I~ it follows all t _> 0.
Ptf
c I~ for
For application to network processes characterisation of monotonicity via properties of the Q-matrix is more effective. The strongest result is
329
THEOREM 5.4. (Massey, 1987) X is -<-monotone if and only if the following holds: For all x,y c E, x -< y, and all A E I_~ such that either x E A or y ~ A, if follows
Q(x,A) <_ Q(y,A) ,

where Q(x,A ) = ~ y ~
q(x,y).
COROLLARY 5.5. X is -<-monotone if and only if the following holds: For all x, y E E, x -< y, we have
A E I_<, x,y ~_ A ~ Q(x,A) < Q(y,A), B E D_~, x, y ~_ B ~ Q(x, B) >_ O(y, B) .

Applying these criteria is often very tedious. [For an example see Massey (1989).] Therefore the following simple sufficient conditions for monotonicity are of interest. PROPOSITION 5.6. (Daduna and Szekli, 1995) For a generalized migration process X let I f [ < o c . Denote for x E E , F x = { ( a , d ) E F : x + a - d c E } . X is -<-monotone if for all x,y C E, x -< y, (a,d) E F, we have if (a,d) E Fx NFy and ~(x + a - d -< y) if (a, d) c Fx MFy and ~(x -< y + a - d) if (a, d) ~ Fx and ~(x -< y + a - d) if (a, d) q~ Fy and ~(x + a - d -< y) then A(x,a,d) <_ A(y,a,d), then A(x, a, d) > > _ A(y, a, d),
then A(y, a, d) = O, then A(x, a, d) = 0 .
The conditions in Proposition 5.6 have an obvious meaning: starting two version of the process deterministically ordered we bound the intensities of jumps into successor states which would violate the ordering. This opens the possibility for proofs by explicit coupling constructions. An easy application of Proposition 5.6 yields COROLLARY 5.7. (Monotonicity of Jackson and Gordon-Newell networks) State dependent Jackson and Gordon-Newell networks with non decreasing service rates (Example 3.3, (3)) are _<~,t-monotone. The flow of customers in linear open networks can be described using the partial sum ordering which was used by Whitt (1981) to prove monotonicity results for tendem systems. DEFINITION 5.8. (Partial
y = ( y , , . . . ,yj) e E
J i=1 J
sum
order)On
E C_ RJ
let for x =
(xl,...,xj),
j=l,...,J
i=1
330
H. Daduna
<, is a closed order on E _C NJ equipped with the standard topology and on E C NJ with the discrete topology. We denote by _<st, the strong stochastic ordering on dr(E, 5) generated by <, according to (27). Checking the conditions of Proposition 5.6 yields PROPOSITION 5.9. Let X denote the joint queue length process of a Jackson network with state dependent service according to Example 3.3 (3) such that the service rates are non decreasing in the local queue lengths. Suppose the routing matrix R = (r(i,j): i,j E J) has the following structure:
[r(i,j)>O~(y'=i+lVj=i-1)]
then X is _<st,-monotone.
and [r(i,O)>O~==~i=J],
(28)
The linear networks of Proposition 5.9 are common models of flow shop systems and transmission lines in telecommunication. In telecommunication theory the method of adjusted transfer rates is common to reduces more complicated meshed networks to linear systems (for theoretical foundations of this method see Reiser (1979, 1982) for closed networks, and Daduna and Schassberger (1993) for open networks). It is therefore interesting that a converse of Proposition 5.9 holds: a network which is _<st,-monotone for all choices of non decreasing service rates has a topology which satisfies (28), see Daduna and Szekli (1995), Section 4, for details. That is: the <st,-monotonicity characterizes the network topology. We shall reconsider these networks in Section 6 in connection with correlation theory for network processes. A different approach to performance evaluation via stochastic ordering is summarized in Shanthikumar and Yao (1994). Elaborating the likelihood ratio ordering monotonicity results are proved with respect to service rates in closed exponential networks. Further, these authors investigate second order properties (e.g., convexity) for network processes. See Chang et al. (1994) and Buzacott et al. (1994) for further readings. 6. Correlation theory for network processes During the last decade there has been an active research concerning the correlation structure of stochastic networks. Possibly the most extensive work directed to detecting and systematizing dependency structures is documented in the book of Disney and Kiessler (1987). They investigate the network state processes as well as the customer flow processes, and individual sojourn processes for customers. This section relies on the paper (Daduna and Szekli, 1995) where in addition further dependency results from the literature are collected. Foundations of correlation theory for Markov processes can be found in Liggett (1985), Chapter II.2, or Lindqvist (1988). It turns out that monotonicity of the network processes is required for applying the general theory. We therefore consider only processes which fulfill Assumption 5.1. The positive or negative
331
correlation properties we consider are always strengthenings of the usual positive or negative correlation for pairs of random variables. DEFINITION 6.1. (Correlation in space) Let P denote a probability measure on the ordered measurable space (E, 50) with partial ordering -< which fulfils Assumption 5.1. P is associated (-<) if and only if f E f g d P > L f d P [ g d P -aE f o r a l l f , gEI*_< . (29)
Let Y = (Y1,..., YJ) denote a RJ-valued random vector. Then Y and the distribution PY are called positive upper orthant dependent (PUOD) if and only if
J
P(Yj > Yi, j = 1,... ,J) >_ IIP(

j=l
> yD
for all ( y l , . . . , y J ) E Nr .
(30)
Y is negatively associated (NA) if and only if for every set A C { 1 , . . . , J } , A ~0, A g; {1,... ,J}, and for all f : ~A __+ R, 9 : RAc --+ R which are bounded continuous and increasing with respect to the natural ordering Cov(f(Y/: i E A), 9(Yj: j E Ae)) <_ 0 . (31)
Note that Y is PUOD if and only if for all sets {J) : R -+ R : j = 1 , . . . , J } of functions which are bounded continuous and increasing with respect to the natural ordering we have
J J
For practical purposes it is important that (30) (and (29) as well) provides explicit lower bounds for the probabilities of compound events by products of probabilities of simpler (marginal) events. Complementary upper bounds are given by (31). DEFINITION 6.2. (Correlation in time) A Markov process X = ((Xt : (~2,J~,P) (E, 50)) : t E N+), the state space of which fulfils Assumption 5.1, is associated (-<) in time, if and only if for all k > 1, all k-dimensional marginals ( X ( h ) , . . . ,X(tk)), 0 < tl < t2 < . . . < tk, are associated (_<k) onE~, where (_<k)is the product ordering. Definition 6.2 does not regard to an internal structure of the state space E. Network processes have multidimensional state space from their very definition. But the gist of network processes, the space-time correlation, is not really met by all the classical product form results: we know that in equilibrium the one dimensional marginals in time have product form (independent spatial coordinates in the open networks!) - but we do not really understand up to now the interplay between space and time in these networks.
332
H. Daduna
The fundamental results on association in time for Markovian processes are Theorem 2.14 and Corollary 2.21 in Liggett (1985): A -<-monotone Feller process X = ((Xt : (/2, ~-,P) ---+ (E, ~ ) ) : t E ~+) with compact state space (fulfilling the Assumption 5.1) with infinitesimal generator A having domain ~a is associated (-<) in time if and only if its initial distribution y ( 0 ) is associated (-<) and one of the following conditions holds:
Ptfg>_Ptf'Ptg,
or
for a l l f , g E I < ,
and a l l t _ > 0 ,
Afg> fAg+gAf,
f o r a l l f , gE~AAI*_< .
(32)
For the non compact case this theorem holds as well (Daduna and Szekli (1995), Theorem 3.1). It turns out that (32) is in the case of discrete state spaces equivalent to Harris' Up-Down criterion (Liggett (1985), p. 80), which is well suited for at least some generalized migration processes: TREOREM 6.3. (Harris' Up-Down criterion) A -<-monotone Markov process with discrete state space X = ((Xt : (f2, @,P) ---+ (E, 5P)) : t E ~+) (fulfilling the Assumption 5.1) is associated (-<) in time if and only if the initial distribution of X is associated (-<) and the Q-matrix satisfies the up-down property:
q(x,y) > 0 ==> (x ~ y V y -< x) .
(33)
Typical application of the Up-Down criterion is as follows: PROPOSITION 6.4. (Association of generalized birth-death processes) A _<-monotone generalized birth-death process X = ((Xt: (f2, Y , P ) ----+ (E, 5P)) : t E N+) with associated (<) initial distribution pX(O) is associated (_<) in time. The natural product ordering _< on Ne is taylored for birth~teath processes but unfortunately enough, for non-trivial Jackson and Gordon-Newell network processes the Up-Down criterion cannot be satisfied: any jump of a customer between two nodes creates a transition of the process which is not up-down. Therefore the standard setting of Markovian correlation theory does not cover these important applications. We have to look for a delicate balance between the requirements for monotonicity and association. A sort of counterpart to Corollary 5.7 for association is PROPOSITION 6.5. (Partial sum Up-Down property) Let the generalized migration process X have state space (E, J ) with the partial sum ordering _<, (see Definition 5.8). If X has only one-at-a-time movements it fulfills the Up-Down criterion (33). It is important to note that this statement does not depend on the numbering of the nodes. Changing the numbering only reverses some directions of the jumps. Combining Propositions 5.9 and 6.5 we have
333
COROLLARY 6.6. (Association for linear Jackson networks) Consider a Jackson network with state dependent service rates which are non decreasing in the queue lengths. If the network topology fulfills (28) and if the initial distribution of the network is associated (_<,) then the network process is associated (_<,) in time. Evaluating (29) according to the Corollary 6.6 yields a variety of probability bounds for complex events concerning different time points and nodes (i.e., the space-time behaviour of X) by products of probabilities for simpler events. To prove further results on association in time for open networks will probably need definition of new orderings taylored to the specific network topologies some examples are given in Daduna and Szekli (1995). The results presented up to now do not require the processes to be in equilib r i u m - a consequence are the restrictions on the network topology required for proving association in time. We can prove further results if we assume the network processes to be in equilibrium. PROPOSITION 6.7. Consider a Jackson network with non decreasing service rates in equilibrium with stationary distribution n for the joint queue length process X. Then for all i,k ~ N, 1 _< i < k, 0 _< tl < t2 < -.. < ti..- < tk, and all bounded functions f : Ni _~ R, g : Nk-i _+ R, which are non decreasing with respect to the natural orderings,
En(f[X(tl),. .. , X ( t i ) ] g [ X ( t i + l ) , . . . ,X(tk)]} ,X(tk)l}
> E~{f[X(tl),...
,X(ti)]}E={g[X(ti+l),...
Especially X is P U O D in time, i.e., for k > 1 all k-dimensional marginals (X(tl),,..,X(tk)), 0 <_ tl < t2 < . . . < tk, are PUOD. The proof follows from the observation that n has independent coordinates, hence is associated (<), and that in equilibrium the time reversal of X describes a (different) Jackson network with non decreasing service rates, hence is <-monotone by Corollary 5.7. Therefore Lemma 3.2 in Daduna and Szekli (1995) applies. For monotonicity of the time reversal of the network process X it is essential to have E equipped with the natural order. Using different orderings, e.g., _<,, would not yield monotonicity of the time reversed networks with respect to _<, in general (see Proposition 5.9, which indicates when this symmetry is violated). For closed Gordon-Newell networks we can not expect positive correlation due to the fixed population in the network (Remark 2.11). But obtaining the steady state distribution of a closed network by conditioning on the total population size in a suitable Jackson network's steady state opens the possibility to apply a theorem of Joag-Dev and Proschan (1983): THEOREM 6.8. (Negative association in Gordon-Newell networks) Let X denote the joint queue length process of a Gordon~Newell network with non decreasing service rates according to Definition 2.9 and Example 3.3 (3). If X is in equilibrium then its one dimensional marginals X ( t ) are negatively associated.
334
H. Daduna
7. Customers of different types, the arrival theorem The state description of generalized migration processes (Section 3) does not distinguish between individual customers with respect to their behaviour inside the nodes and their movements in the network. This is expressed by using net transitions in Definition 3.2. Introducing different customer types in that setting is possible but widens the model not essentially: At each node there will be a separated area for each customer type and customers of the same type are not distinguished with respect to their behaviour inside the areas and their movements between areas in the network. Thus the areas replace the nodes in the modeling process and the new class of models fits into the old one by introducing further nodes which substitute the areas. The situation changes completely if we want to describe individual customers' behaviour inside the nodes and on their itinerary through the network. We have to record in detail where the individual customers stay in service or queue, and we have to decide about service and routing depending on which type of customer is considered. With respect to the customers' behaviour inside the nodes we prescribe the queueing regime (see e.g. Definitions 2.1 and 2.4) or more general in Section 9. To describe migration of the individual customers there are two different methods which are equivalent with respect to the versatility of the modeling in applications (see Melamed (1982) for details): The random routing as used in the celebrated BCMP networks of Baskett et al. (1975) and deterministic routing as described by Kelly (1979), Chapter 3. Random routing in open BCMP networks is shortly described as follows: customers of type s on leaving some node i change their type to t and enter node j S~I with probability ri, j according to a Markovian type-node transition rule. This formalism results in traffic equations similar to (2) and (4). In the following we shall consider in detail networks with deterministic type dependent routing. This formalism has advantages with respect to theoretical evaluation of the network processes. DEFINITION 7.1. (Mixed network with deterministic routing) We consider a network of nodes J = { l , . . . , J } which are exponential single server nodes according to Definition 2.4, node j having sj identical service channels, servicing according to FCFS regime. Any customer entering node j requests there for an amount of service which is exponentially distributed with mean/~)-1. All service times are drawn independently of another. For a simple description of the network we assume that all positions where customers may be located are linearly ordered: at node j positions 1 , . . . , s j represent the service channels, position sj + 1 houses the customer at the head of the queue, and so on. The nodes are organized according to the shift-protocol, i.e., if at node j there are nj customers present they occupy positions l , . . . , n j . If a customer on position p C {1,... ,sj} is served he departs and customers on positions p 4- 1 , . . . , nj are shifted one position ahead continuing there service in the new channel or start
335
service right now. The rearrangement takes zero time. Because all service channels are identical, this protocol, although looking somewhat artificial, yields the same results when evaluating the main performance measures of the network compared with the random selection of free channels as described in Definition 2.4. The network serves a finite population of internal customers numbered 1 , . . . , M , M >_ 0. At any time internal customer m is of some type t E T(m). Further, external customers arrive at the network, being of some type t E T(0). The sets T(m), m = O, 1,... ,M, are countable and pairwise disjoint. With each M type t c Urn=0 T(m) is associated the finite route r(t) = Jr(t, 1 ) , . . . , r(t, S(t)], which a customer of type t follows. Multiple visits to nodes on a route are allowed. External customers of type t C T(0) arrive in a Poisson stream with rate 2t > 0, enter node r(t, 1), and after leaving that node immediately j u m p to node r(t, 2 ) , . . . , and being finally served at node r(t, S(t)), they depart from the network. The arrival streams are independent and independent of the service times. Internal customer m, of type t, travels route r(t), thereafter changes to type t' E T(m) and travels route r(t') with probability pm(t, t'), where Pm = (pro(t, t') : t, t'~ T(m)) is an irreducible positive recurrent stochastic matrix. Given the previous type t the type decision is independent from the network's history. Using type-stage sequences the state description for the network is as follows: For node j the local state ej E Ej describes the empty node j, and
zj = [tjl,Sjl;tj2,sj2;. . . ;tjnj,Sjnj] E E j
indicates that there are nj customers present, on position p E { 1 , . . . , nj} there resides a customer of type tjp being on stage s2p c { 1 , . . . , S(tjp)} of his route r(tjp). Ej is called the local state space of node j. The global states of the network are compounded of local states, the global state space E contains the feasible global states
J
z= [zl,...,v]
c Iluj
j=l
N o t all elements from I~jJ~ Ej are feasible in E due to the restriction on the interal population: any internal customer can and must occur in a global state exactly once. I f we have M = O, no internal customers cycle, and the network is open. I f T(O) = O, no external arrivals occur, and the network is closed. The general case is therefore termed mixed network. THEOREM 7.2. (Steady-state distribution for mixed exponential networks) For the mixed exponential network of Definition 7.1 let Z = (Z(t) : t >_ 0) denote the type-stage process or generalized queue length process with state space E. F r o m the stochastic assumption put on the system it follows that Z is a strong M a r k o v process on E. We assume that Z is irreducible on E.
336
H.
Daduna
For the internal customers we define the traffic equations and their solutions am = (C~m(t) : t E T(m)) to be the stochastic solution of
O~m = ocmPm~ m = 1~ . . . , m .
We assume henceforth that for all m = 1 , . . . , M
Z
ter(m) ForjEJ,
M
< oo
holds,
and
<
teT(o)
tEUm=0T(m), 1 < s < S(t), denote
~j(t,s) =
2t ~m(t)
if t E T(0) if t c T(m), m E { 1 , . . . , M )
If Z is ergodic, the unique stationary and limiting distribution ~z of Z on E is for z=[zl,...,zj]EE with generic local states zj=[tjl,sjl;tj2,sj2;...;tjn:,sj,:], j= 1,...,J,
t,
'
where Cj(n) = min(n, sj) and K < ee is the norming constant. (Note that K < oc is the necessary and sufficient condition for ergodicity of Z.) The proof of Theorem 7.2 can be done by checking the global balance equations for Z, which is rather tedious. A simpler version of the proof is Kelly (1979): Consider the time reversal of Z which is a mixed network of the same type with the identical nodes of the original network, but with customer movements in the reversed direction, and with type change probabilities according to the Markov chains obtained from the time reversed transition matrices associated with the Pro. Then a simple Local Balance criterion connecting Z and its time reversal can be applied. rc as given in Theorem 7.2 is said to be of product form similar to the steady states of Jackson and Gordon-Newell networks, the Remarks 2.7 and 2.11 apply in the present context as well. Describing individual customers' behaviour on a route commences with observing the customer when entering the network, if he is an external customer, or entering a route, being an internal customer. In the latter case it is obvious, that observing in equilibrium an internal customer m being type t on starting route r(t), we do not observe the network in steady state. This is because of the information about the system's state due to the specified random time when the observation takes place. Clarifying these situations leads to proving arrival theorems for jumping customers, or departure theorems. They are typically concerned with the difference between time stationary behaviour of the network as described in Theorem 7.2 and customer stationary behaviour which is concerned with observing embedded processes. The embedded sequence of observation times is usually defined by instants when a certain type of events takes places, characterized by jumping customers. There are now several ways of looking on arrival
337
theorems which are essentially equivalent for the applications we have in mind. We shall describe the approach via limiting distributions similar to Sevcik and Mitrani (1981). For describing the problems and results using point process formalism see Serfozo (1989b), Bremaud et al. (1992), K o o k and Serfozo (1993), and in more detail Baccelli and Bremaud, 1994, the latter dealing mostly with the single node system. Denote by A(t,s) = (A~(t,s) :n ~ N) the sequence of time instances when internal customer m being type t enters stage s of his route r(t). Use similar notation for the sequence of specific jump instances for external customers of type t. If the network's state just after such time instant A, (t, s) is z E E then we know that z shows at node r(t,s) = j on position nr(t,s) =rtj the state description (tj~j, sj~:) = (t, s). The disposition d(z) of the other customers with respect to the (t, s)-jump is obtained from z by deleting the pair (tj,j, sj~j) = (t, s). Note that if t is an external type the disposition is a state in E, while if t is an internal type, d(z) ~_ E. In case t is an internal type we denote the feasible set of dispositions by E-t,~ which turns out to be a subset of the state space of the network when customer m with t c T(m) is deleted. THEOREM 7.3. (Arrival theorem) Suppose the network process Z = (Z(t) : t > O) of the mixed network is ergodic. Let t E T(0) and s E { 1 , . . . , S ( t ) } . Then the embedded process Z t'S = (d(Z(An(t,s))) :n E N) is an ergodic Markov chain with unique limiting distribution
~t'~(z) := lim P(d(Z(An(t,s))) = z) = z(z),

n--+OO
z EE .
(34)
Let t E T(m), m E { 1 , . . . , m } and s E { 1 , . . . , S ( t ) } . Then the embedded process Z t'S = (d(Z(An(t,s))) :n E N) is an ergodic Markov chain with unique limiting distribution
~zt'S(z) := lira P ( d ( Z ( a , ( t , s ) ) ) = z)
n--+oo
= ~(z' )~r(t,,) (t,s) -1
+ 1 ) C -1
z C
(35)
where nr(t,s)(Z) is the queue length of the disposition state at node r(t, s) under z, and z' E E is obtained from z c E-t,s by inserting the pair (t, s) at node r(t, s) on position nr(t,~)(z) + 1 in state z. C is the new norming constant. If an external customer arrives from the outside at the network, (34) states that in the limit these Poisson Arrivals See Time Averages, which is the PASTA property for arrival processes driven by a Poisson process Wolff (1982). The result for external customers and s > 1 is not the PASTA property because in general customer flows inside a network are not Poissonian even for external customers in totally open networks (Melamed, 1979). For internal customers in closed or mixed networks the flows are not Poissonian as well and a PASTA result could not be expected. Note that the arrival theorem for closed networks was proved simultaneously by Sevcik and Mitrani
338
H. Daduna
(1981) and by Lavenberg and Reiser (1980). At that time the result was strongly needed for applications in computer communication networks. It opened the possibility of computing individual transmission times for messages in such networks. A further important consequence was the development of the Mean Value Analysis (MVA) algorithm for computing first order quantities as mean queue lengths, mean sojourn times, and throughputs. For a survey see the handbook (Lavenberg, 1983). REMARK 7.4. (Limiting versus stationary behaviour) The arrival Theorem 7.3 is stated as a result for limiting embedded distributions. From Markov chain properties it follows that the limiting distributions are the equilibrium distributions of the embedded Markov chains as well. For open networks the result is often losely stated as: If the network is in equilibrium, an external customer on his arrival observes the other customers in the network in steady state. This proposition is heuristic, because it can be seen easily that, e.g., in an M/M/1/a-queue, which is started in equilibrium at time 0, the first arriving customer does not observe the queue length in equilibrium. We have the additional information that up to the first arrival no arrival occured, and therefore the equilibrium is perturbed. For closed networks the result is often stated as: If the network is in equilibrium, a customer in a jump instant observes the other customers in the network distributed according to the steady state of exactly that network, obtained by deleting himself from the population. This is correct for the case of indistinguishable customers (the Gordon-Newell network case). But as Walrand noticed, this is in general not correct, if we have different customer types and distinguish customers in the state description. This puts restrictions on the sequencing of the other customers with respect to the position of the jumping customers, Walrand, 1988, p. 144/145. A consequence of this observation is the cumbersome notation of the limiting distribution (35). The point process approach to problems around the various arrival theorems can be sketched as follows: Consider the generalized queue length process Z = (Z(t) : t E ~) in steady state using the same notations as above. This process is time stationary. Consider for (t, s) the point processes N t,s (as random measures on (R, g)), where Nt,S(B), B E ~, counts the number of jumps of type t customers to stage s of path r(t,s) in time set B. These counting processes are stationary under the set of deterministic left shifts. Construct the (t, s)-Palm measures with respect to point process N t,~. This is stationary under the random shift by inter(t,s)-jump times. Then the generalized queue length process on a probability space with (t, s)-Palm measure yields a process which is stationary at (t, s) jumps with embedded distribution ~t,~. For details see Serfozo (1989b), K o o k and Serfozo (1993), Baccelli and Bremaud (1994). Most jump instants of the process can be considered as an arrival time and a departure time as well. Therefore there is a similar theorem for departures inside the network. Furthermore we can prove a departure theorem for external cus-
339
tomers departing from the network. The latter yields a counterpart to the PASTA theorem. (For details see Kook and Serfozo (1993), Daduna (1997)).
8. Individual customers' sojourn time distributions and passage time distributions
In the field of product form networks seemingly the most challenging problem is to determine sojourn time and passage time distributions. The first results were obtained by Reich (1957, 1963) and Burke (1968, 1972), who considered linear tandem systems and proved that in equilibrium the successive sojourn times of a customer at the nodes are stochastically independent. The next stage of the development was the introduction of overtake-free paths in open networks by Walrand and Varaiya (1980) and Melamed (1982): these paths are generalized linear routes in the more complex Jackson networks. For cyclic networks the results parallel to those of Reich and Burke were proved by Chow (1980), Schassberger and Daduna (1983), and Boxma et al. (1984). Sojourn time results on overtake-free paths in Gordon-Newell networks were proved by Daduna (1982, 1984) and Kelly and Pollett (1983). The development of the problem up to 1990 and further relevant topics for computing exact or approximate sojourn time distributions may be found in Boxma and Daduna (1990). The following results are from Schassberger and Daduna (1987). We shall compute sojourn time distributions in explicit closed form expressions which resemble the product form of the steady state. It turns out that this is possible along specific paths the structure of which is basically as follows: The path is divided into three successive subpaths. The first and the last subpath consist of infinite server nodes, these subpath being uncritical. The second subpath is critical; it may begin and end with a multiserver node which is not an infinite server, but in between it fulfills a topological overtake-free condition and overtake-freeness of single server FCFS nodes. DEFINITION 8.1. (Quasi overtake-free paths) Consider a mixed network as defined in 7.1. (1) For nodes i,j E { 1 , . . . , J } write i - (t) --+j, i f / = r(t,s), j = r(t,s + 1) for some t E [ffUmm= 0 it(m), and 1 < s < S(t), or if i = r(t,S(t)),j = r(t', 1) for some m E { 1 , . . . , M } with t, t' E T(m) andpm(t,t') > O. (2) For type t a t-relevant path [/'1,... ,jK] from node jl to node jK is a sequence of nodes such that jk - (tk) -~ jk+l, 1 _< k < K, holds with
M
tk E U T(m)
m=0
if t c T ( 0 ) if t c T(r~), l < m < M ,
3//
tk C U T(m)
m=0 m,arn
(Note that in general a path is not part of a specific route.)
340 M
H. Daduna
(3) For t E Um=0 T(m) let [r(t, u ) , . . . , r(t, v)] be a section of different successive nodes of route r(t), 1 < u < v < S(t). This section is overtake-free for type t if every t-relevant path from r(t,u') to r(t,v') includes node r(t, u 1 + l ) ,
Id ~ 1,lt ~ Vt ~ V.
M T (m) his route r(t) is quasi overtake-free (4) For a customer of type t E Urn=0 if the following holds: There exist u, v with 1 < u < v < S(t) such that
Sr(t,1) . . . . . Sr(t:u+l) . . . . . Sr(t,u_l) ~- 00,
Sr(t,v_l) = 1, Sr(t,S(t)) = O0 ,
Sr(t,~+1) . . . . .
and the section It(t, u ) , . . . , r(t, v)] is overtake-free. REMARK 8.2. (Structure of quasi overtake-free paths) Overtake freeness in Definition 8.1, Condition 3, is a topological condition on the network, defined via the routes and type change probabilities. A path where this condition is obviously violated is [1, 2, 3] in the network of Figure 1. Condition (4) deals with the internal structure of the nodes, an example of a path fulfilling the toplogical constraints (3) but obviously not (4) is [1, 2, 3] in Figure 2. The definition of quasi overtake-freeness fixes a balance between necessary restrictions on the network topology and the node structure such that influences generated by the traveling customer of type t in the first part of route r(t) cannot overtake this customer when traversing the route. The section [r(t, u ) , . . . , r(t, v)] is the critical part of the route r(t), where the first and the last node may be multiserver nodes which are not infinite servers. All the other nodes on the critical part must be single servers (see Burke (1969)).
Poisson-Z ' ~ / ]
Fig. 1. The Simon-Foley network with overtaking due to the network's topology.
Fig. 2. Burke's tandem network with overtaking due to the internal node structure,
341
The next theorem holds for any path, no overtake-free condition is needed. It is a consequence of the strong Markov property of Z and Theorem 7.3. THEOREM 8.3. (Asymptotic sojourn time distribution) For the nth customer of type t traversing route r(t) denote by (T, (~), " " " , T(~7),) ~lt) the vector of his successive sojourn times at the nodes of r(t). Then independently of the inital distribution of the system
limE[exp(-OlT~ ....
=E~z,,l[exp(-O1T:
~)
....
1) _.
O s(t)~s(~))] ~-(~) '~ l

" ' - - O S ( t ) T(1) S(t))l,
Os >_ O, s= 1,...,S(t)
(36)
Here Ew,, ['] is an expectation under Pw.~['], the measure which describes the system conditionally on the first (t, 1) jump at time 0 seeing the other customers' disposition distributed according to n t,1 (see Theorem 7.3). From (36) we obtain under additional assumptions on the path and node structure the main result. THEOREM 8.4. (Sojourn time distribution on quasi overtake-free paths) For customer of type t let his route r(t) be quasi overtake-free. For the nth customer of type t traversing route r(t)let (T("),,.., Ts((~))) be the vector of his successive sojourn times at the nodes of r(t). Then independently of the initial distribution limooE[exp(-O~ T}~) --" .. - O S(t) TS(t))J (n) ,1
=
= X ]l ~
~z (nr(t,1), (nr(t.I)." nr(,S(t))) (n,(t~)+l-s,(t,p))+
,nr(t,s(t)))
#r(t'p)Sr(t~)
p=, \p,(t,p)Sr(t,p) + Op
0~ >_ 0, s = l,...,S(t)
(37)
where (a)+ = max(a, 0), and ~t,1 (nr(t,1),... ' nr(t,S(t)) ) is the probability under Pw., ['], to see at time 0 the arriving type t customer at the tail of the queue at node r(t, 1), and having queue lengths vector (nr(t,1),...,nr(t,s(t))) for the number of other customers present at the nodes of r(t). The steady-state distribution under Pw.' ['] of the sequence ((T~ n), "", ~S(t)JT(n) ~: n = 1,2,...) is given by the Laplace-Stieltjes transform (37) as well. The proof of Theorem 8.4 uses computations under Pw.~['] according to Theorem 8.3, i.e., with disposition distribution n t,1 [-] of the other customers seen by the arriving type t customer. We first compute under P~,.I[-] the joint dis-
342
H. Daduna
tribution of t's sojourn time at node r(t, 1) and the state of the network at t's departure instant from that node, i.e., at time T}1). Having obtained this joint distribution, from the strong Markov property of Z a splitting formula is derived, which splits the sojourn times into a part concerning nodes r(t, 1 ) , . . . , r ( t , u ) (given by an explicit expression) and into a part concerning the rest of the route in an implicitly given term. While this still holds for general paths, then the quasi overtake-free property is needed to convert the implicitly given term into an explicit formula. This is done by an induction argument similar to that in the proof of the main result in Boxma et al. (1984). REMARK 8.5. (Refinements and complements) The main sojourn time result of this section is stated in terms of exponential networks. The proof of the theorem carries over directly to quasi overtake-free paths embedded into networks with much more general node structure as long as the product form steady state is preserved. These nodes may feature general capacity functions @(.), with general allocation rules for service capacity to the customers, general insertion rules for new arrivals, and shift protocol (see Kelly (1979), Section 3.1). Even more: if a node outside r(t) is a symmetric server (see Kelly (1979), Section 3.3), the service time requests of the customer may be type dependent and non exponentially distributed. For details of these nodes and networks see Sections 9 and 10. Requiring the nodes of the path r(t) to work under shift protocol is not essential, the classical rule for channel selection by random or any other rule may be used. Hemker (1990) proved a remarkable refinement of Theorem 8.4 allowing the nodes r(t,u),r(t,v) (in the numbering of Theorem 8.4 and Definition 8.1) to be semisimple, which means: multiserver exponential queue with different service intensities for the channels. For these nodes the shift protocol is necessarily required. Kook and Serfozo (1993) embedded linear tandems r(t) of exponential single server queues into more general networks where service rates at some nodes outside r(t) may depend not only on the local queue length but on more global state information of the network, as discussed for the generalized migration processes, e.g. in Example 4.1. Incorporating into this setting the more general structure of the servers in Kelly's networks and proving the sojourn time result for quasi overtake-free paths r(t) with nodes r(t,u),r(t,v) (in the numbering of Theorem 8.4) being semi-simple in the sense of Hemker (1990) was done in Daduna (1997). Theorem 8.4 is stated for a customer traversing a complete path. Considering subpaths with the quasi overtake-free property yields similar results. In this case we use P~.~[.] as the underlying measure, where w is the first node of the subpath under consideration. On the other hand, if for internal customer m with t, t ~ E T(m) andpm(t, t') > 0, the itinerary Jr(t), r(/)] pasted together is quasi overtake-free, the theorem may be applied after redefining routes and types suitably. Depending on the network's topology and the migration of the internal customers a vector (T1,...,Ts(t)) distributed according to (37) shows some independence structure:
343
Without any additional assumption the random variables T1,...,Tu_I, T~+I,..., Ts(t) and the vector (T~,..., Tv) form an independent set. If additionally route r(t) is external (t E T(0)), and for some u _< ul < u2 < ... < uk _< v nodes r(t, ul),r(t, u2),...,r(t, uk) are visited by external customers only, then T1,...,Tu_I,Tu~,T~2,...,Tu~,T~+I,...,Ts(t) and the vector (T~ : w C {u,u+ 1 , . . . , v } - { u l , . . . ,uk}) form an independent set. In totally open networks the successive sojourn times of a customer traversing a quasi overtake-free path are independent The Laplace-Stieltjes transform of the customer's end-to-end-delay, or total passage time for the customer traversing path r(t) is obtained by setting in (37): O1 = 02 . . . . . Os(t) = O. For totally open networks we obtain convolution formulas for the end-to-end-delay In principle the Laplace-Stieltjes transform (37) can be inverted by hand. We obtain e.g. for the end-to-end-delay distribution mixtures of convolutions of Erlangian distributions. But due to the complexity of the network this might be a tedious task, especially if due to the internal customers there will arise difficulties in computing norming constants. A survey of references for numerical techniques to effectively invert the transform is given in Boxma and Daduna (1990) Although quasi overtake-free paths are rather specific structures there are important applications and models covered by Theorem 8.4. Flow shop systems and transmission lines in networks often satisfy the necessary requirements. Even more: the method of adjusted transfer rates developed for applying MVA (mean value analysis) to closed networks can be further developed to more general networks and leads to linear structures where the sojourn time theorem applies, Daduna and Schassberger (1993). Proving further results follows from the observation that any two-station route of successive distinct exponential multiserver queues is quasi overtake-free COROLLARY8.6. (Two-stations walk) Let in the (general!) setting of Theorem 8.3 for customer of type t the route r(t) have length S(t) = 2. Consider for the nth (n) (n) customer of type t traversing route r(t), (T i , T~ ), the vector of his successive sojourn times. If r(t, 1) r(t, 2), independently of the initial distribution lirnooE[exp(-O1 T} ") - 02T~n))]
=
(Z 1 ,..,Zj)EE
Os_>0, s = l , 2
where rct,l is defined as in (37), and n4t,p ) is the queue length at node r(t,p) in state (zl,... ,z j) C E. If r(t, 1) = r(t, 2) independently of the initial distribution
344
lim
H. Daduna
E[exp(-O1T(n)
(n)
Z 7ff'l (Z1 ...,ZJ) 'Ur(t'l) ( (z,,...,zj)EE ' 'l/r(t,1) -}- O1 ]~r(t,1)

(
k]~r(t~T 01)
On_>0, s = 1 , 2 .
~r(t'l)Sr(t'l)
(nr(t,I)+l st(t,1))+
#r(t,1)Sr(t,1) ~ (nr(t")+l-sr("'))+
Especially, on any two-station walk through two different nodes where only external customers arrive the successive sojourn times of a customer are independent. The independence result for two-stations walks with different stations in open networks is a rather general result, because no restriction on the network's topology are required. Unfortunately enough, there is no generalisation to routes with S(t) > 2. We shall discuss this in the context of single server Jackson networks. A fundamental counterexample was described by Simon and Foley (1979). EXAMPLE 8.7. (Simon-Foley network and Burke's tandem) The Simon-Foley network is a three-station feed forward network of exponential single server queues according to Figure 1. We assume the joint queue length process X to be ergodic. Let ($1, $3) denote a random vector distributed according to the limiting distribution of a customer's sojourn time vector who traverses the network via path [1, 3]. From Corollary 8.6 we know that the coordinates of ($1,$3) are independent. Let (7"1,T2, T3) denote a random vector distributed according to the limiting distribution of a customer's sojourn time vector who traverses the network via path [1, 2, 3]. Again from Corollary 8.6 we know that the vectors (T1, T2) and (T2, T3) have independent coordinates. The result of Simon and Foley is that (T1, T3) are dependent! Foley and Kiessler (1989) proved that the vector (T1, T3) is associated (_<) (see Definition 6.1 (29), hence positively correlated. It is an open question, whether (T1, T2, T3) is associated. Burke's tandem is a linear three-stage tandem, consisting of (in this ordering): a single server, a two-channel server, and a single server, see Figure 2. Let (T1, T2, T3) denote a random vector distributed according to the limiting distribution of a customer's sojourn time vector who traverses the tandem. Again from Corollary 8.6 we know that the vectors (T1,T;) and (Tz, T3) have independent coordinates. Burke's result (Burke, 1969) is that (T~, T3) are dependent! It can be shown that (T1, T3) is associated (_<). Kiessler and Disney (1982) computed a set of equations the solution of which in principle would provide us with the passage time distribution of a customer in the Simon-Foley network. Fayolle et al. (1983) reduced the passage time problem
Stochastic networks withproductform equilibrium
345
for this network to a boundary value problem. Unfortunately enough it seems not possible to conclude from either of these results statements about the qualitative behaviour of passage times. Knessl and Morrison (1991) studied the effect of overtaking in this network using heavy traffic analysis. In general open exponential networks positive correlation holds for sojourn times on three-station paths (Daduna and Szekli, 2000).
THEOREM 8.8. (Three-station walks) Consider an open network of exponential multiserver nodes according to Definition 7.1 and a customer of type t with route r(t). Consider on this route a subpath of length 3 of different successive nodes which is traversed by t. Let (T1, T2, T3) denote a random vector distributed according to the limiting distribution of t's sojourn time vector when traversing the subpath. Then (T1, T2, T3) is PUOD (see Definition 6.1 (30)).
Theorem 8.4 and the discussion in Remark 8.5 and Example 8.7 show that explicit results which provide closed form expressions for sojourn time and passage time distributions are rare. On the other hand are delay time (passage time) quantiles important performance measure for networks. These are needed e.g., when for a communication system it has to be guaranteed that at most e% of the accepted messages experience a delay of more than a critical time /crib where usually e% is a very small quantity, which is interpreted as a probability. Similar problems arise in lead time control of production and inventory. The standard methods for obtaining delay time quantiles when there is no analytic result at hand are simulation (Iglehart and Shedler (1980)) and approximation. Simulation and approximation are needed even if product form modeling is applicable, when the path of interest is not overtake-free. Especially in this case the Independent Flow Time Approximation (IFTA) seems to be valuable and in many cases working well. Hohl and Ktihn (1988) discussed this method in detail and investigated numerically and by simulation the power of the IFTA-method. The method is shortly: Compute the sojourn time distribution at the successive nodes independently and assume that independence holds in between them. Then the delay time distribution is evaluated as convolution of the individual sojourn times at the nodes. This procedure is a wellknown disaggregation-aggregation method. Applying it to computing end-to-end-delay distributions yields results very close to exact values in case of feed-forward networks. Using the IFTA method to compute end-to-end-delay distributions for paths which include cyclic feedback structures is a critical task, as can be seen from specific examples where it works rather badly. IFTA is exact for quasi overtake-free paths, especially for any two-station path of different nodes according to Corollary 8.6. The results of correlation theory for sojourn times in this section yield probability bounds for the results of the IFTA method in various systems. Theorem 8.8 implies that IFTA yields strict lower bounds for joint quantiles of sojourn times on any three-station path of different nodes.
346
H. Daduna
9. General and symmetric servers: Insensitivity theory When time-sharing usage was introduced in main frame computer systems the usual analytical models of these systems were the exponential systems described in Section 2. Measurement and empirical analysis of these systems showed surprisingly good predictions of the models, e.g., with respect to the distribution of the number of jobs concurrently in system, or the mean response time. This was surprising because the measurements further revealed that the assumption of exponential distribution for the requested CPU time was not realistic. This insensitivity against deviations from the shape of the service time distribution could not be reproduced in the models used at that time, because the equilibrium behaviour for systems with non exponential requested CPU times was not accessible, see Kleinrock (1964, 1967), Adiri et al. (1969). The solution of the modeling problem was introducing the processor sharing discipline, where, instead of servicing customers for fixed small time quanta oneby-one in a round-robin fashion, all customers share the server, each obtaining an equal portion of the service capacity, Kleinrock (1967). Sakata et al. (1969, 1971) freed this model from the assumption of exponential service time, and computed the steady state queue length distribution in that case. And indeed they reproduced with this model the empirical observation, that the queue length distribution and the mean response time of a job did not change when the service time distribution was perturbed - as long as the mean service time remained fixed (Kleinrock, 1967). Similar observations were made earlier in M / M / s loss systems with no waiting room (dating back to Erlang, and being resolved only 1957 by Sevastyanov (1957)), and in infinite server systems. Later on similar properties were found even in networks of such nodes, Baskett et al. (1975), Kelly (1976). These observations gave rise to an insensitivity theory for general stochastic systems: General Bedienungsschemata and generalized semi-Markov schemes, K6nig, 1974; Schassberger, 1977, 1978b. It was shown that the phenomenon of insensivity was strongly coupled with certain structural properties of the system. Schassberger (1986) showed how to include the queueing models which usually require an infinite state space for describing the nodes' behaviour into the framework of semi-Markov schemes. The application of these abstract theory to queueing networks was described in Schassberger (1978a). See Glassermann (1991) and Glynn (1989) for application of these schemes to more recent developments in discrete event simulation. I shall describe in the following the theory for single node systems from scratch without using the formalism of the general system theory. DEFINITION 9.1. (General single node service system) We consider a single node service system. Customers arrive in a Poisson stream of intensity 2 E (0, ec), and can be distinguished according to different types. The set T ~ (3 of possible types is countable. An arriving customer is of type t E T with probability c~(t), ~-~tcr c~(t) = 1.
Stochastic networks, with product form equilibrium
347
Customers of type t request for an amount of service time which is distributed according to some cumulative distribution function Bt(s), s > O. The mean service time request of type t customers is # t i e (0, e~), and we assume ~ t E r ~(t)(#t) -1 < OC. We assume further that for all t Bt(.) is of phase-type according to Definition 9.2 below. The sequences of interarrival times, service requests, and type selections constitute an independent family of random variables. The (general) service discipline to be considered is a tripel (a, b, c), defined as follows: (1) There is an infinite number of linearly ordered positions where customers may reside. The positions are numbered 1,2, 3 , . . . , and if there are n > 0 customers in the system (either waiting or in service) they occupy positions 1,2,...,n. (2) If there are n > 0 customers present, the system offers a service capacity ~b(n) > 0, ~b(0) = 0. This service capacity is allocated to customers in system according to some function c(.,n): customer on position i yields a portion c(i, n) E [0, 1] of the offered total service capacity, ~'~4~=1c(i, n) = 1. (3) If there are n _> 0 customers present and an arrival occurs, the new customer is placed on some position i c { 1 , 2 , . . . , n + 1}. Customers previously on positions i, i + 1 , . . . , n are shifted one step up into positions i + 1, i + 2 , . . . , n + 1. Thereafter all the old stay-on customers on positions 1 , 2 , . . . , i - 1, i + 1, i + 2 , . . . , n + 1 are permuted according to some permutation a c Tn, the set of permutations of n-sets. This happens with probability a(i, a; n) ~ [0, 1], in( N-,n+ l dependent of the history of the system, vz_~i=l ~r, a(i, a; n) = 1.) (4) If there are n > 0 customers present and the service time of the customer in position i E { 1 , . . . , n } expires, this customer immediately departs from the system. Customers previously on positions i + 1,i 2 , . . . , n are shifted one step down into positions i, i 1 , . . . , n - 1. Thereafter all the stay-on customers on positions 1 , 2 , . . . , n - 1 are permuted according to some permutation z E Tn-1, the set of permutations of ( n - 1 ) - s e t s . This happens with probability b(~; n, i) E [0, 1], independent of the history of the system. ( ~ r , , _ ~ b(r; n, i) = 1.) DEFINITION 9.2. (Phase-type distributions) For k C N+ and/~ > 0 let /~,k(s)=l-e (Ps) i -~sz_.. ~ '
i--0
s>_0 ,
denote the cumulative distribution function of the F-distribution with parameters /~ and k. k is a positive integer and serves as a phase-parameter for the number of independent exponential phases, each with mean /~-1, the sum of which constitutes a random variable with distribution F~k. (F~,k is called a k-stage Erlang distribution with shape parameter/~.) We consider the following class of distributions on (R+, ~$+), which is dense with respect to the topology of weak convergence of probability measures in the
348
H. Daduna
set of all distributions on (~+,B+) (Schassberger (1973), Section 1.6). For fi E (0, oc), K E N+, and probability r on {1,... ,K} with r(K) > 0 let the cumulative distribution function
K
B(s) =
k=l
s > 0,
denote a phase-type distribution function. With varying fi, K, and r we can approximate any distribution on (R+, B+) sufficiently close. For a short introduction into this and various other classes of phase-type distributions see Asmussen (1987), Section III.6. The rationale behind using phase-type distributions is: suppose for a customer of type t c T with service request according to
K,
=
k=l
> 0,
on his arrival the number k of exponential phases is drawn according to the distribution rt on {1,...,Kt}. We then know he has to obtain exactly these k phases of service. Each phase is memoryless, so, if we want to control his residual service time, we only need to record the residual number of phases he has to obtain. This enables us to describe the node's evolution over time by a continuous time Markov process with a discrete state space by incorporating the customers' residual requested number of service phases into the state space. DEFINITION 9.3. (State space and process description) Consider the general service system of Definition 9.1. The state space E for describing the system's evolution consists of e E for the empty node, and sequences
z = [tl,kl;t2,k2;...;tn,kn]
EE ,
of pairs (ti, ki) E T x N, such that ki E {1,... ,K~i} holds. Here n = n(z) is the queue length (number of customers either in service or waiting) in state z, and the pair (ti, ki) indicates that on position i < n a customer of type ti resides requesting for exactly ki further independent exponential phases of service, each of them with mean fl~ 1. We denote by Z = (Z(t) : t > 0) the state process of the system with state space E. Z is called supplemented queue length process. The phase variables ki are called supplementary variables. The type process Y = (Y(t): t > 0) is obtained from Z by deleting the supplementary variables: if Z(t) = [h,kl;t2,k2;...;tn,k,,] c E, then f ( t ) {tl, t 2 , . . . , t~] c r ~. The queue length process X = (X(t): t > 0) is obtained from Z: if Z(t) = [h, kl; t2, k2;... ; t., k~] E E, then X ( t ) = n.
-
From the probabilistic assumptions in Definition 9.1 and the construction of the supplemented process the next statement is obvious.
349
COROLLARY 9.4. For the general service system of Definition 9.1 the supplemented queue length process Z according to Definition 9.3 is a strong Markov process. Z is non explosive, irreducible on E, and conservative. There exists a version of Z, having right continuous paths with lefthand limits, which from now on always we assume to be chosen. The central notion of this section is DEFINITION 9.5. (Symmetric server) A general service system according to Definition 9.1 is symmetric, if the service discipline is symmetric, i.e., if for all n > 0, and i < n
c(i,n) = ~
acT~,
a(i, cr;n- 1) .
REMARK 9.6. The definition of general and symmetric servers follows Kelly (1979), Chapter 3, (the symmetric server), and Yashkov (1980), (variable service rates and permutation discipline). The possible realisations of symmetric service disciplines encompass the three service disciplines of the BCMP networks, Baskett et al. (1975), which allow for non exponential service times: processor-sharing (PS), last-come-first-served/ preemptive resume (LCFS/pr), and infinite server (IS). The single server with FCFS is not symmetric, and the multiserver with FCFS, which are not infinite servers, as well. Note that in the BCMP setting FCFS is only allowed with type independent exponentially distributed service requests. It is easy to introduce general and symmetric servers with only a finite number of positions available. This leads to generalized loss systems. A further variant, where the results to be presented below, carry over directly, is obtained if at any time instant, when a service phase expires (not necessarily leading to a departure), a permutation of all the customers present is performed according to some random decision. This is a way to approximate continuous control of the service allocated to the different customers, by using large phase parameters/~t. THEOREM 9.7. (Steady states for symmetric servers) Consider a symmetric service system according to Definitions 9.1 and 9.5 with supplemented queue length process Z = (Z(t) : t _> 0), type process Y = (Y(t) : t > 0), and queue length process X = (X(t) : t _> 0). (1) Z, Y, and X are ergodic if and only if
~0~ n=O
(2) If Z is ergodic, the unique stationary and limiting distribution n of Z on E is n(e) = H -1, and
350
g[ll'kl;t2'k2;'";tn'knl=
H. Daduna
I~Ii= ~~(ti) 1 \Yti[-ff-~"lFti(h)/#ti~h=ki) "H 1 [tl,kl;t2,k2;...;tn,kn] E E .

1, [tl, t2,...,tn] E 0 Tn
nzl
(38)
The unique stationary and limiting distribution ~ of Y on {e} U UnC~_l T n then is re(e) = H -1, and 2e(ti) rC[tl,t2,..., tn] = i__~1 ~ ' H _
(39)
The unique stationary and limiting distribution n of X on N then is

n n
n E ~
(40)
(3) In case of the symmetric service system having an ergodic supplemented queue length process the steady state distribution of the queue length process X and the process of types Y are insensitive against variations of the shape of the service request distribution Bt(.), as long as the mean service request #;-1 is not changed. (4) Consider Z in equilibrium. Given the queue length {X = n} and the sequence of types {Y = [tl, ta,..., tn]}, the residual service requests [kl,k2,... ,kn] are conditionally independent and distributed like the stationary residual life distributions of renewal processes with life time distributions Bti('), i = 1 , . . . , n. (5) Consider Z in equilibrium. Given the queue length {X = n}, the conditional distribution that on position i E { 1,..., n} resides a customer of type ti is
#ti
and the events {on position i resides a customer of type ti}, i = 1,...,n, are conditionally independent. REMARK 9.8. (Insensitivity) The insensitivity property of the stationary distributions of X and Y and its consequences for the system's structure are developed in detail for generalized semi-Markov processes and the underlying generalized semi-Markov schemes. These systems were first developed as B e d i e n u n g s s c h e m a t a , generalizing the Erlangian and Engset queueing systems and incorporating the insensitivity properties proved for these classical processes, K6nig et al. (1974). Generalized semi-Markov processes and systems constitute a class of systems which are well suited for theoretical investigations. From a practical point of view these processes and systems constitute a class of stochastic systems which are general enough to cover a wide class of applications. See e.g. Schassberger (1978a).
351
The classical form of an insensitive stationary distribution is in case of symmetric servers combined with a second insensitivity property: if for a given set of parameters ~(-), 2, { B t , t E T}, ~b(.), and some symmetric seryice discipline (a, b, c) the supplemented queue length process is ergodic, then ergodicity holds for all symmetric service disciplines as long as this set of parameters is kept invariant, and the steady state distribution does not depend on the specific service discipline. From a practical point of view the insensitivity under perturbations of the service request distribution leads to computational simplifications: Performance analysis of the server with respect to equilibrium queue lengtlls distribution or typesequence distributions, and quantities derived from these fundamental characteristics, can be reduced to the case of exponential distributions with the correct means. This reduces the state space of the system under investigation considerably. THEOREM 9.9. (Steady states for general exponential servers) Consider a general service system according to Definition 9.1. Let all the customers' requests for service time follow an exponential distribution with type independent mean #-1. Then the supplemented queue length process Z = (Z(t) : t > 0) shows always the residual phase numbers 1 only, i.e, Z and Y coincide. (1) Z and the queue length process X are ergodic if and only if
H = <
(2) If Z is ergodic, the unique stationary and limiting distribution 7c of Z on

E = {e} (_J UnCC_l T n is re(e) = H -1, and
"
=11 .... ~ i=1 #q)ku
[t~,t2,..
(41)
The unique stationary and limiting distribution rc of X on N is

n(n) = . H -1, n E N .
(3) If for a given set of parameters ~(.), 2, #, ~b(.), and any service discipline (a, b, c) the supplemented queue length process is ergodic, then this ergodicity holds for all service disciplines as long as this set of parameters is kept invariant. Further, the steady state distribution does not depend on the specific service discipline. This holds independently of whether the service discipline is symmetric or not. (4) Consider Z in equilibrium. Given the queue length {X = n}, the conditional distribution that on position i E { 1 , . . . , n} a customer of type ti resides, is
,)1
352
H. Daduna
and the events {on position i resides a customer of type ti}, i = 1 , . . . , n , conditionally independent.
are
The general exponential servers in Theorem 9.9 generalize the familiar exponential multiservers of the birth-death process theory, see Definitions 2.1 and 2.4. We shall henceforth refer to these general service systems as exponential servers in contrast to the symmetric servers of Definition 9.5. It is a common observation that generalization of Theorem 9.7 to exponential servers with non exponential or type dependent service requests is not possible. Indeed, this can be stated as a theorem, as will be shown below. A further observation is, that almost any known explicit solution of the steady state equation for queueing system processes follows nearly the structure of (38), (39), and (40). These distributions can be characterized as being of (internal) product form for the node. The most well known such internal product form is the steady state distribution of the general birth-death process (1). In contrast we may then specify the product form distribution of network processes, e.g. in (3) for Jackson networks or in (5) for Gordon-Newell networks, as being external over different nodes. DEFINITION 9.10. (Product form distributions for single nodes) Consider an ergodic general service system according to Definition 9.1 with general, not necessarily symmetric service discipline, and describing processes Z, F, and X according to Definition 9.3. The equilibrium distribution n of the supplemented queue length process Z is said to be of(internal) product form, if and only if it is of the form n(e) = H -1, and
rC[tl'kl;t2'k2;'"itn'kn]
~i~-~ ~
_
{~LZrh(h) ) . H -1,
\ t~ h=ki /
'
[ t l , k l ; t 2 , k 2 ; . . . ;tn,kn] E E .
(42)
The equilibrium distribution ~ of the type process Y is said to be of (internal) product form, if and only if it is of the form ~(e) -- H -1, and
n )~(i) , 7z[tl, t2,..., &] = ]7"
/_/-1
[tl, t2,,.., tn] C
0
n=i
T~
The equilibrium distribution ~r of the queue length process X is said to be of (internal) product form, if and only if it is of the form ~(n)=
\ tET fit ,,I \i=1
. H -~,
hEN
We are now prepared to formulate and prove the main results on the structure of general service systems. A general statement on the conclusions from the theory is that the results of Theorem 9.7 characterize the subclass of symmetric servers in various ways.
353
The theorems can be proved by adapting the proofs in Schassberger (1977, 1978b, 1986). Only the statement of Theorem 9.14 has to be proved directly. The first theorem is a converse of Theorem 9.7, (2). THEOREM 9.11. Consider an ergodic general service system according to Definition 9.1 with general service discipline (a, b, c), and supplemented queue length process Z according to Definition 9.3. If the stationary distribution rc of Z has product form (42) and there exists some type t c T with c~(t) > 0 such that the distribution of the requested service time for customers of type t is K,
Bt(s) =
k=l
, >_ 0 ,
with Kt > 1, rt(Kt) > 0, i.e., Bt is not exponentially distributed, then the service discipline of the server is symmetric. The second theorem is a converse of Theorem 9.7, (3). THEOREM 9.12. Consider an ergodic general service system according to Definition 9.1 with general service discipline (a, b, c), and supplemented queue length process Z. If the steady state distribution of Y or X is insensitive with respect to variations of the service request distribution as long the expected service request is held invariant, then the service discipline is symmetric. The third theorem resumes the concept of locally balanced systems as discussed in connection with group-local-balance (GLB), see Definition 4.2. Part of the statement is just the analogue of Corollary 4.3. THEOREM 9.13. Consider an ergodic general service system according to Definition 9.1 with general service discipline, and describing processes Z, Y, and X according to Definition 9.3. (1) If the service discipline is symmetric, then the (generally non-Markovian) type process Y fullfills the following partial balance equations: For n _> 1 and [tl,t2,...,tn] E Un~l Tn and all i = 1,...,n, rC[tl,..., ti, . . . , & ]#ti (~(n )c( i, n) = Z
a@Tn-1
nit, l , . . . , to(i_1), t~(i+l),..., t~n]2~(ti)a(i, a; n - 1) ,
(43)
and rC[tl,..., t,] flo:( t)a( i, a; n)

aETn
= ~
"vET.
7z[t'vl,..., t~(i-l), t, t a i , . . . , tzn]ktt~)(n + 1)c(i, n + 1)b(z; n, i) (44)
354
H. Daduna
(2) Assume that the service request distributions are all exponential. Then Z and Y can be identified and the partial balance equations have an interpretation of locally balancing probability fluxes. If there exists some strictly positive distribution p on {e} t_J Un%l Tn which balances these probability fluxes locally, i.e., which fulfills the partial balance Equations (43) and (44), then Z is ergodic with unique steady state distribution p. (3) Suppose the first partial balance system (43) has a strictly positive solution p on {e} U Un~__l T n. Then the service discipline is symmetric, and p is the one dimensional stationary distribution of Y. Further Z is ergodic. Consider a simple M / M / 1 / o o node under FCFS regime where two types of customers arrive which have exponential service times with type dependent different means. It is well known that even for this simple system investigating the steady state behaviour is not an easy task. The following theorem clarifies the situation to some extent. THEOREM 9.14. Consider an ergodic general service system according to Definition 9.1 with general service discipline, and only exponentially distributed service time requests. Then the processes Z and Y can be identified, both with state space {e} U Un~I Tn. Assume Y has a product form steady state distribution (41). If there are at least two different customer types q, t2 E T with a(tl), c~(t2) > 0, having different means #tl 1 ; #~1 of the service time requests, then the service discipline is symmetric.
10. Mixed networks of general exponential and symmetric servers

In this section we incorporate the general exponential servers and the symmetric servers of Section 9 into the mixed network formalism with deterministic routing of Section 7. This leads to a versatile class of stochastic models describing spacetime interaction. The main point of interest is the question of stabilization for the network and the explicit description of the limiting and stationary behaviour. The section is mainly in the spirit of Kelly (1979), Chapter 3, incorporating the generalizations suggested by Yashkov (1980). DEFINITION 10.1. (Mixed network of exponential and symmetric servers) We consider a network of nodes av = { 1 , . . . , J } which are exponential nodes or symmetric nodes according to Definitions 9.1, 9.5, and the requirements in Theorem 9.9. The network serves a finite population of internal customers numbered 1 , . . . , M , M _> 0. At any time internal customer m is of some type t C T(m). Further, external customers arrive at the network, being of some type t c T(0). The sets T(m), m = 0, 1,... ,M, are countable and pairwise disjoint. With each M type t E [,Jm=0 T(m) is associated the finite route r(t) = [r(t, 1 ) , . . . , r(t, S(t)], which a customer of type t follows. Multiple visits to some nodes on a route are allowed.
355
External customers arrive in a Poisson stream with rate 2t > 0, enter node
r(t, 1), and travel route r(t) = Jr(t, 1 ) , . . . , r(t, S(t)], departing thereafter from the
network. The arrival streams are independent and independent of the service times. Internal customer m, of type t, travels route r(t), thereafter changes to type t' E T(m) and travels route r(t') with probability pro(t, t'), where Pm = (p,~(t, t') : t , f C T(m)) is an irreducible positive recurrent stochastic matrix. Given the previous type t the type decision is independent from the network's history. The nodes' structure and the service regime is described by node specific service d i s c i p l i n e s (aj, b/, cj), and node specific capacity functions @(.),j E o 7, see Definition 9.1. The service time requests of customers at exponential nodes are exponentially distributed with node specific mean/~q at exponential node j, independent of the customers' types and stages. The service time requests of customers at symmetric nodes are phase-type distributed (see Definition 9.2), the distribution may individually depend on the customers' types and the stage number they entered on their paths. A customer of type t entering stage s E { 1 , . . . , S(t)} of his itinerary requests an amount of service drawn according to Kt,s
B,,s(.) =
k=l
The states for a Markovian description of the network's evolution are as follows: For exponential node j the local state ej E Ej describes the empty node j, and
zj = [tjl,Sjl; tj2,sj2; . . . ; tjnj,sj,,jl E Ej

indicates that there are nj customers present, on position p C { 1 , . . . , nj} there resides a customer of type tjp being on stage Sip E { 1 , . . . , S(tjp)} of his route r(tjp). Ej is called the local state space of exponential node j. For symmetric node j the local state ej C Ej describes the empty node j, and Zj = [tjl,Sjl,kjl;tj2,sj2,kj2;... ;tjnj,Sjn:,kjns] E Ej indicates that there are nj customers present, on position p E {1,... ,n j} there resides a customer of type % being on stage Sip c { 1 , . . . , S(tip)} of his route r(tjp), requesting for kip E {1,...,Ktjp,sj~} residual phases of exponentially distributed service times, each of these phases with mean fitj;,~,p. 1 Ej with these states is called the local state space of symmetric node j. The global states of the network are compounded of local states, and the global state space E contains all the feasible global states
J
z = Iz,,...
,z l
e c l-ie
j=l
356
H. Daduna
Note that not all elements from [IJ=l Ej are feasible for E due to the restriction on the internal population: any internal customer can and must occur in a global state exactly once. For an easy to read description of the equilibrium behaviour of such networks we define local measures 7z j, the structure of which depend on the internal structure of the nodes. THEOREM 10.2. (Steady state distribution for mixed networks) For the mixed network of exponential and symmetric nodes according to Definition 10.1 let Z = (Z(t) = ( Z j ( t ) : j = 1,... ,J) : t _> 0) denote the supplemented joint queue length process of the network. The local supplemented type-stage process or generalized supplemented queue length process Zj = (Zj(t): t > 0) lives on a subset of Ej, the specific form of which depends on whether node j is symmetric or exponential. Z is assumed to be irreducible on E. It follows that Z is a strong Markov process on E. For the internal customers we define the traffic equations and their solutions C~m= (0Cm(t): t C T(m)) to be the stochastic solution of O~m= O:mPm, m = 1, . . . , M .
We assume henceforth that for all m = 1 , . . . , M ~m(t)S(t) < oo,

tcr(m) M
and
~
tcT(O)
2tS(t) < oo holds .
For j C J, t E Urn=0 T(m), 1 < s < S(t), denote ~lj(t, s) = { 2, if t E T(0) C~m(t) i f t E T ( m ) , m E { 1 , . . . , M } .
I f Z is ergodic, the unique stationary and limiting distribution rc of Z on E is given as follows: Let z = [zl,. ., zj] c E denote a generic state of Z. If node j is exponential with generic local state z j = [ t j l , S j l ; t j 2 , sj2;... ; tjnj, sjnj] c Ej, the local measure ~j on Ej at j is ~zj(ej) = 1, and
If node j is symmetric with generic local state z j = [tjl,syl,kjl;tj2,sj2, kj2;... ;tj,j, sj~j, kj,j] E Ej, the local measure 7z a. on Ej at j is rg(ej) = 1, and
~ S t : ~ , ~ : , /~:~ ; t:2, s : , k: ; . . .
,,
[,,
;t:.,, s/,,:, /% ] %,,,J, "~
357
J I f K = ~[~,...,~jJcErlj=l stationary distribution

=
7Zj(Xj)(
00, then Z is ergodic with unique limiting and
J
= II ,, z = [zl,...,zj] c .
j--1
The insensitivity properties described in Remark 9.8 reappear here for all the symmetric nodes of the network. Performance analysis of the system with respect to queue lengths distribution or type-sequence distributions and quantities derived from these fundamental characteristics can be reduced to the case of exponential distributions with the correct means. This reduces the state space of the system under investigation considerably.
11. Complements
In this section some of the main topics not considered in the review will be mentioned combined with references for further readings. REMARK 11.1. (Loss networks) The classical model of a loss system was already investigated by Erlang in detail: the exponential M/M/s/O-FCFS system, see Definition 2.4. When determining the steady state distribution of the M/G/s/OFCFS, with service time distribution G = F~,~, he observed the insensitivity property of the equilibrium distribution with respect to variation of the shape of the service time distribution, as long as the mean is held fixed. See the introduction of Section 9, Theorem 9.7, and Remark 9.8 for details. It turned out that such an insensitivity property holds even for much more general loss systems: General networks with customers which require resources from different nodes of the network concurrently (e.g., circuits from different links for call transmission in a circuit-switched telephone network), and which are lost if at least one of these joint resources is not available. The surprising simple form of the steady states is due to product form structure, similar to those presented in the preceding sections. For a review see Kelly (1991). The most important problem in such networks then is determining optimal admission policies for calls with different priorities, or different rewards. For more detailed information see e.g., Laws (1995), Kelly (1995), MacPhee and Ziedins (1996), Zachary (1996). REMARK 11.2. (Diffusion approximation) Using functional central limit theorems during the last decade the theory and application of diffusion approximation in the field of queueing networks has made considerable progress. Diffusion approximation of queue length processes, which are piecewise constant jump processes, by Brownian motions with continuous paths is classical for single node systems in heavy traffic. There were different approximation assumptions used for dealing with the process when reaching the boundary, i.e., when the sytem empties; see Kleinrock (1976), Section 2.8, Kobayashi (1983)
358
H. Daduna
and for tandem systems Newell (1979). For queueing networks the problems become much harder, because of the multidimensional state spaces of the network processes and of the modifications of the limiting Brownian motion, (Williams, 1995, 1996). An introduction into the field is (Harrison, 1985), recent developments can be found in the books of Yao (1994), Kelly and Williams (1995), Kelly et al. (1996). REMARK 11.3. (Fluid approximation) Fluid approximation of single queues is the classical deterministic approximation of queue length processes by using expected values instead of random variables, see Kleinrock (1976), Section 2.7, and Newell (1982), Chapter 2. The principles of fluid approximations have revived recently: using functional strong limit theorems, deterministic limits of averaged queueing processes were used to prove stability of network processes (Dai, 1995). These limits then function as strong (deterministic) approximations of the network state. Several applications of these approximation principles can be found in Yao (1994), Kelly and Williams (1995), Kelly et al. (1996) as well. REMARK 11.4. (Stochastic dynamic optimization of networks) For single server systems this is a classical topic, Optimization of the birth-death processes of Definition 2.2 and exponential multiserver queues or state dependent single server systems (Examples 2.1 and 2.4) with respect to various optimality criteria is performed in e.g. Groh (1981) and Serfozo (1981). The problem of individual versus social optimization in networks of parallel queues is investigated in Bell and Stidham Jr (1983). In fact this is a case of structurally rather simple systems, because routing decision are only performed on the entrance point of the network and jockeying is not allowed. The problem of controlling the arrivals to networks of queues is studied in Stidham Jr (1985), a survey on Markov decision models for control of arrivals and services in networks of queues is Stidham Jr and Weber (1993). REMARK 11.5. (Networks in discrete time) Discrete time queueing systems were investigated already in the early times of queueing applications in computer systems. For references see the introduction of Section 9. A surprising observation was: although from a theoretical point of view the processes are much simpler than in continuous time, the computational problems with explicitly deriving steady state distributions are much harder. Only recently the field revived from the need of queueing models for modeling high speed communication networks. These networks feature a structure with an inherent discrete time scale due to the transmission protocols of the systems. A survey of single node systems in discrete time is Takagi (1993). Elementary introductions are Bruneel and Kim, (1993), where some aggregation/disaggregation methods for networks are sketched, and Woodward (1994), where a specific type of networks is considered in detail, A tutorial survey on networks in discrete time focussing on product form networks is Daduna (1996).
359
REMARK 11.6. (Interacting particle systems) These are stochastic models of physical systems, developed for describing the behaviour of mass phenomena of interacting particles. The standard reference is Liggett (1985). There the time development and the stationary behaviour of such systems are considered. The classical approach was only studying the steady state phase of the system. These phases turned out to be in many cases Markov random fields on an infinite lattice. The connection to network theory as described here is the similarity in the space-time behaviour of the systems. The networks of queues can be considered as interacting systems on a finite graph, which is usually more irregular than in the models investigated in physics. Additionally the local states of the interacting components are more complicated. For an introduction to Markov processes and random fields in connection to queueing and migration processes see Kelly's spatial processes, (Kelly, 1979), Chapter 9, and Whittle's space-time processes (Whittle, 1986), p. 425. REMARK 11.7. (Networks with negative customers) During the last years a new feature for queueing networks was successfully incorporated into the setting of product form calculus: special customers may arrive at the network and annihilate other customers which they meet. The effect of these negative customers is therefore an immediate jump down (by one step or more, depending on the number of customers annihilated) of the queue length at these customers' arrival. The adjustment of the traffic equations for incorporating this feature leads to nonlinear traffic equations. An early paper on the problems is Gelenbe (1991). Related models have been investigated, where customers may merge with other customers, or where customers may split into different customers, or where external signals appear at the network enforcing interruption of some customers' service with a subsequent immediate departure of the interrupted customers. For further information see e.g. Chao et al. (1996) and the references given there. With respect to explicit computations some of these models pose rather difficult problems, an example is presented in Harrison and Pitel (1995), where the Laplace transform of the sojourn time distribution for a linear two-stage tandem of exponential servers with negative customers is derived in quite involved expressions. The problems of customers' merging, splitting, and annihilation by other customers become hard due to considering individual customers' behaviour under FCFS regime. Considering the general migration processes with state dependent multiple transitions, where the individual queueing behaviour is not recorded (see Example 4.1), we see that these features can be modeled in this setting successfully as well. For a discussion see Henderson et al. (1995), and for related Petri net models Henderson and Taylor (1991a).
References
Adiri, I. and B. Avi-Itzhak (1969). A timesharing queue. Manag. Sci. 18, 639-657. Agraval, S. C. (1985). Metamodeling: A Study of Approximations in Qeueueing Models. MIT Press, Cambridge, MA.
360
H. Daduna
Asmussen, S. (1987). Applied Probability and Queues. John Wiley and Sons, Inc., Chichester, New York, Brisbane, Toronto, Singapore. Baccelli, F. and P. Bremaud (1994). Elements of Queueing Theory. Springer, New York. Baskett, F., M. Chandy, R. Muntz and F. G. Palacios (1975). Open, closed and mixed networks of queues with different classes of customers. J. Assoc. Comput. Machin. 22, 248-260. Bell, C. E. and S. Stidham Jr (1983). Individual versus social optimization in the allocation of customers to alternative servers. Manag. Sci. 29, 831-839. Boucherie, R. J. and N. M, van Dijk (1990). Spatial birth-death processes with multiple changes and applications to batch service networks and clustering processes. Adv. Appl. Probab. 22, 433-455. Boucherie, R. J. and N. M. van Dijk (1991). Product forms for queueing networks with state-dependent multiple job transitions. Adv. Appl. Probab. 23, 152-187. Boxma, O. J. and H. Daduna (1990). Sojourn times in queueing networks. In Stochastic Analysis of Computer and Communication Systems (Ed., H. Takagi), pp. 401-450. IFIP, North-Holland, Amsterdam. Boxma, O. J., F. P. Kelly and A.-G. Konheim (1984). The product form for sojourn time distributions in cyclic exponential queues. J. Assoc. Comput. Machin. 31, 128-133. Bremaud, P., R. Kannurpatti and R. Mazurndar (1992). Event and time averages: A review. Adv. Appl. Probab. 24, 377-411. Brockmeyer, E., H. L. Halstrom and A. Jensen (1948). The life and the work ofA. K. Erlang, volume 2 of Transactions of the Danish Academy of Technical Science. Danish Academy of Science, Copenhagen. Bruneel, H. and B. G. Kim (1993). Discrete-Time Models for Communication Systems including ATM. Kluwer Academic Publications, Boston. Burke, P. J. (1968). The output process of a stationary M/M/s queueing system. Ann. Math. Statist. 39, 1144-1152. Burke, P. J. (1969). The dependence of sojourn times in tandem M/M/s queues. Oper. Res. 17, 754-755. Burke, P. J. (1972). Output processes and tandem queues. In Proc. Symposium on Computer-Communications Networks and Teletraffie (Ed., J. Fox), pp. 419-428. Polytechnical Press, Brooklyn, N.Y. Buzacott, J. A., J. G. Shanthikumar and D. D. Yao (1994). Jackson network models of manufacturing systems. In Stochastic" Modeling and Analysis of Manufacturing Systems (Ed., D. D. Yao), pp. 1-46. Springer Series in Operations Research, Chapter 1, Springer, New York. Chang, C.-S., J. G. Shanthikumar and D. D. Yao (1994). Stochastic convexity and stochastic majorization. In Stochastic Modeling and Analysis of Manufacturing Systems (Ed., D. D. Yao), pp. 189-232. Springer Series in Operations Research, Chapter 5, Springer, New York. Chao, X., M. Pinedo and D. Shaw (1996). Networks of queues with batch services and customer coalescence. J. Appl. Probab. 33, 858-869. Chow, W. M. (1980). The cycle time distribution of exponential cyclic queues. J. Assoc. Comput. Machin. 27, 281~86. Cohen, J. W. (1982). The Single Server Queue. 2nd edn, North-Holland Publishing Company, Amsterdam, London. Daduna, H. (1982). Passage times for overtake-free paths in Gordon Newell networks. Adv. Appl. Probab. 14, 672-686. Daduna, H. (1984). Burke's theorem on passage times in Gordon-Newell networks. Adv. Appl. Probab. 16, 86%886. Daduna, H. (1996). Discrete time queueing networks: Recent developments. Preprint 96-13, Institut ffir Mathematische Stoehastik der Universitfit Hamburg (Tutorial Lecture Notes Performance '96). Daduna, H. (1997). Sojourn time distributions in non-product form queueing networks. In Frontiers in Queueing: Models and Applications in Science and Engineering (Ed., J. Dshalalow), pp. 197-224. Chapter 7, CRC Press, Boca Raton.
361
Daduna, H. and R. Schassberger (1993). Delay time distributions and adjusted transfer rates for Jackson networks. Archiv fiir Elektronik und Obertragungstechnik 47, 342-348. Daduna, H. and R. Szekli (1995). Dependencies in Markovian networks. Adv. Appl. Probab. 25, 226-254. Daduna, H. and R. Szekli (1996). A queueing theoretical proof of increasing property of Polya frequency functions. Stat. Probab. Let. 26, 233-242. Daduna, H. and R. Szekli. On the correlation of sojourn times in open networks of exponential multiserver queues. Queueing Syst. Appl. 34, 169-181. Dai, J. G. (1995). Stability of open queueing networks via fluid models. In Stoehastik Networks, volume 71 of IMA Volumes in Mathematics and its Applications (Eds., F. P. Kelly and R. J. Williams), pp. 71-90. Springer, New York. Disney, R. F. and P. C. Kiessler (1987). Traffic processes in queueing networks: A Markov renewal approach. The Johns Hopkins University Press, London. Disney R. L. and D. K6nig (1985). Queueing networks: A survey of their random processes. S l A M Rev. 27, 335-403. Fayolle, G., R. Iasnogorodski and I. Mitrani (1983). The distribution of sojourn times in a queueing network with overtaking: reduction to a boundary problem. In Performance Analysis (Eds., A. K. Agrawala and S. K. Tripathi), pp. 477-486. North-Holland, Amsterdam. Foley, R. D. and P. C. Kiessler (1989). Positve correlations in a three-node Jackson queueing network. Adv. Appl. Probab. 21, 241-242. Gelenbe, E. (1991). Produkt form queueing networks with negative and positve customers. J. Appl. Probab. 28, 656-663. Glasserman, P. (1991). Gradient estimation via Perturbation Analysis. Kluwer Academic Press, Boston. Glynn, P. (1989). A GSMP formalism for discrete event systems. Proc. IEEE. 77, 14-23. Goodman, J. B. and W. A. Massey (1984). The non-ergodic Jackson network. J. Appl. Probab. 21, 860 869. Gordon, W. J. and G. F. Newell (1967). Closed queueing networks with exponential servers. Oper. Res. 15, 252-267. Groh, J. (1981). On the optimal control of finite state birth and death processes. In Proceedings of the Sixth Conference on Probability Theory (Eds., B. Bereanu, S. Grigorescu, M. Iosifescu and T. Postelnicu), pp. 427-438. Bucuresti. The Centre of Mathematical Statistics of the National Institute of Metrology Bucharest, Editura Academiei Republicii Socialiste Romania. Harrison, J. M. (1985). Browian Motion and Stochastie Flow Systems. Wiley Series in Probability and Statistics, John Wiley, New York. Harrison, P. G. and E. Pitel (1995). Response time distributions in tandem G-networks. J. Appl. Probab. 32, 224-246. Hemker, J. (1990). A note on sojourn times in queueing networks with multiserver nodes. J. Appl. Probab. 27, 469-474. Henderson, W., B. S. Northcote and P. G. Taylor (1995). Triggered batch movement in queueing networks. Queueing Syst. Appl. 21, 125-141. Henderson, W. and P. G. Taylor (1990). Product form in queueing networks with batch arrivals and batch services. Queueing Syst. Appl. 6, 71-88. Henderson, W. and P. G. Taylor (1991). Embedded processes in stochastic Petri nets. IEEE Trans. Software Eng. 17, 108 116. Henderson, W. and P. G. Taylor (1991). Some new results on queueing networks with batch movements. J. Appl. Probab. 28, 409-421. Hohl, S. D. and P. J. K~ihn (1988). Approximate analysis of flow and cycle times in queueing networks. In Proceedings of the 3rd International Conference on Data Communication Systems and Their Performance (Eds., L. F. M. de Moraes, E. de Souza e Silva and L. F. G. Soares), pp. 471-485. North-Holland, Amsterdam. Iglehart, D. L. and G. S. Shedler (1980). Regenerative Simulation of Response Times in Networks of Queues, volume 26 of Lecture Notes in Control and Information Sciences. Springer, Berlin. Jackson, J. R. (1957). Networks of waiting lines. Oper. Res. 5, 518-521.
362
H. Daduna
Joag-Dev, K. and F. Proschan (1983). Negative association of random variables with applications. Ann. Statist. 11,286-295, 1983. Kant, K. (1993). Steady state analysis of stochastic systems. In Computational Statistics, vol. 9 of Handbook of Statistics (Ed., C. R. Rao), Chapter 2, pp. 17-68. North-Holland, Amsterdam. Kelly, F. P. (1976). Networks of queues. Adv. Appl. Probab. 8, 416M32. Kelly, F. P. (1979). Reversibility and Stochastic Networks. John Wiley and Sons, Chichester, New York, Brisbane, Toronto. Kelly, F. P. (1991). Loss networks. Ann. Appl. Probab. 1, 319-378. Kelly, F. P. (1995). Dynamic routing in stochastic networks. In Stochastik Networks, volume 71 of IMA Volumes in Mathematics and its Applications (Eds., F. P. Kelly and R. J. Williams), pp. 169-186. Springer, New York. Kelly, F. P. and P. Pollett (1983). Sojourn times in closed queueing networks. Adv. Appl. Probab. 15, 638-653. Kelly, F. P. and R. W. Williams (1995). Stochastic networks, volume 71 of IMA Volumes in Mathematics and Its Applications. Springer, New York. Kelly, F. P., S. Zachary and I. Ziedins (1996). Stochastic Networks: Theory and Applications, volume 4 of Royal Statistical Society Lecture Note Series. Clarendon Press, Oxford. Kiessler, P. C. and R. L. Disney (1982). The sojourn time in a three node, acyclic, Jackson network. Technical report vtr 8203, Department of Industrial Engineering and Operations Research, Virginia Polytechnic Institute and State University, Blacksburgh. Kleinrock, L. (1964). Analysis of a time-shared processor. Naval Res. Logistics Quar. 10(II), 59-73. Kleinrock, L. (1967). Time-shared systems: A theoretical treatment. J. Assoc. Comput. Machin. 14(2), 242-261. Kleinrock, L. (1976). Queueing Theory, volume lI. John Wiley and Sons, New York. Knessl, C. and J. A. Morrison (1991). Heavy traffic analysis of the sojourn time in a three node Jackson network with overtaking. Queueing Syst. Appl. 8, 165-182. K6nig, D., K. Matthes and K. Nawrotzki (1974). Unempfindlichkeitseigenschaften yon Bedienungsprozessen. In Einf~ihrung in die Bedienungstheorie (Eds., B. W. Gnedenko and I. N. Kowalenko), pp. 358-450. 2nd edn, Akademie Verlag, Berlin. Kobayashi, H. (1983). Stochastic modeling: Queueing networks. In Probability Theory and Computer Science, International Lecture Series in Computer Science, Chapter Part II (Eds., G. Louchard and G. Latouche), pp. 53 121. Academic Press, London, Orlando. Kook, K. H. and R. F. Serfozo (1993). Travel and sojourn times in stochastic networks. Ann. Appl. Probab. 3, 228-252. Lavenberg, S. S. (1983). Computer performance modeling handbook. Academic Press, New York. Laws, C. N. (1995). On trunk reservation in loss networks. In Stochastik Networks, volume 71 of IMA Volumes in Mathematics and its Applications (Eds., F. P. Kelly and R. J. Williams), pp. 187-198. Springer, New York. Liggett, T. M. (1985). Interacting particle Systems, volume 276 of Grundlehren der mathematischen Wissenschaften. Springer, Berlin. Lindqvist, B. M. (1988). Association of probability measures. J. Multivar. Anal. 26, 111 132. Lavenberg, S. S. and M. Reiser (1980). Stationary state probabilities at arrival instants for closed queueing networks with multiple types of customers. J. Appl. Probab. 17, 1048-i061. MacPhee, I. and I. Ziedins (1996). Admission controls for los networks with diverse routing. In Stochastik Networks." Theory and Applications, volume 4 of Royal Statistical Society Lecture Note Series (Eds., F. P. Kelly, S. Zachary and I. Ziedins), pp. 205-214. Clarendon Press, Oxford. Massey, W. A. (1987). Stochastic ordering for Markov processes on partially ordered spaces. Math. Oper. Res. 12, 350-367. Massey, W. A. (1989). Stochastic ordering for birth-death migration processes. Preprint. Melamed, B. (1979). Characterisation of of Poisson streams in Jackson queueing networks. Adv. Appl. Probab. 11,422-438. Melamed, B. (1982). Sojourn times in queueing networks. Math. Oper. Res. 7, 223244.
Stochastic networks with product Jorm equilibrium
363
Miyazawa, M. (1994). On the characterisation of departure rules for discrete-time queueing networks with batch movements and its applications. Queueing Syst. Appl. 18, 149-166. Miyazawa, M. (1995). A note on my paper: On the characterisation of departure rules for discrete-time queueing networks with batch movements and its applications. Queueing Syst. Appl. 19, 445-448. Miyazawa, M. (1997). Structure-reversibility and departure functions of queueing networks with batch movements and state-dependent routing. Queueing Syst, Appl. 25, 45-75. Nachbin, L. (1965). Topology and Order, volume 4 of Van Nostrand Mathematical Series. Van Nostrand, Princeton. Newell, G. F. (1979). Approximate behaviour of tandem queues, volume 171 of Lecture Notes in Economies and Mathematical Systems. Springer, Berlin. Newell, G. F. (1982). Applications of Queueing Theory. Monographs on Statistics and Applied Probability, 2nd edn, Chapman and Hall, London. Parthasarathy, P. R. (1987). A transient solution to an M/M/1 queue: a simple approach. J. Appl. Probab. 24, 997-998. Reich, E. (1957). Waiting times when queues are in tandem. Ann. Math. Statist. 28, 768-773. Reich, E. (1963). Note on queues in tandem. Ann. Math. Statist. 34, 338-341. Reiser, M. (1979). A queueiug network analysis of computer communication networks with window flow control. IEEE Trans. Comm. COM-27, 1199-1209. Reiser, M. (1982). Performance evaluation of data communication systems. Proc. IEEE. 70, 171-196. Sakata, M., S. Noguchi and J. Oizumi (1969). Analysis of a processor-sharing queueing model for time-sharing systems. In: Proceedings of the Second Hawaii International Conference on System Sciences, pp. 625-628. Sakata, M., S. Noguchi and J. Oizumi (1971). An analysis of the M/G/1 queue under round-robin scheduling. Oper. Res. 19, 371-385. Schassberger, R. (1973). Warteschlangen. Springer, Wien. Schassberger, R. (1977). Insensitivity of steady-state distributions of generalized semi-Markov processes, part I. Ann. Prob. 5, 87-99. Schassberger, R. (1978). Insensitivity of stationary probabilities in networks of queues. Adv. Appl. Probab. 10, 906-912. Schassberger, R. (1978). Insensitivity of steady-state distributions of generalized semi-Markov processes, part II. Ann. Prob. 6, 85-93. Schassberger, R. (1986). Two remarks on insensitive stochastic models. Adv. Appl. Probab. 18, 791-814. Schassberger, R. and H. Daduna (1983). The time for a roundtrip in a cycle of exponential queues. J. Assoc. Comput. Machin. 30, 146-150. Schassberger R. and H. Daduna (1987). Sojourn times in queueing networks with multiserver nodes. J. Appl. Probab. 24, 511-521. Serfozo, R. F. (1981). Optimal control of random walks, birth and death processes, and queues. Adv. Appl. Probab. 13, 61 83. Serfozo, R. F. (1989). Markovian network processes: Congestion-dependent routing and processing. Queueing Syst. Appl. 5, 5-36. Serfozo, R. F. (1989). Poisson functionals of Markov processes and qneueing networks. Adv. Appl. Probab. 21, 595-611. Serfozo, R. F. (1992). Reversibility and compound birth death and migration processes. In Queueing Rel. Mod., pp. 65-90. Oxford University Press, Oxford. Serfozo, R. F. (1993). Queueing networks with dependent nodes and concurrent movements. Queueing Syst. Appl. 13, 143-182. Sevastyanov, B. (1957). An ergodic theorem for Markov processes and its application to telephone systems with refusal. Theo. Probab. Appl. 2, 104-112. Sevcik, K. C. and I. Mitrani (1981). The distribution of queueing network states at input and output instants. J. Assoc. Comput. Machin. 28, 358-371.
364
H. Daduna
Shanthikumar, J. G. and D. D. Yao (1994). Stochastic comparisons in closed jackson networkes. In Stochastic Orders and Their Applications, Probability and Mathematical Statistics, Chapter 14, (Eds. M. Shaked and J. G. Shanthikumar), pp. 433-460. Academic Press, Boston. Simon, B. and R. D. Foley (1979). Some results on sojourn times in acyclic Jackson networks. Manag. Sci. 25, 1027-1034. Stidham, S. Jr (1985). Optimal control of admission to a queueing system. IEEE Trans. Auto. Cont. AC-30, 705-713. Stidham, S. Jr and R. Weber (1993). A survey of Markov decision models for control of networks of queues. Queueing Syst. Appl. 13, 291-314. Stoyan, D. (1983). Comparison Methods for Queues and Other Stochastic Models. John Wiley, New York. Takagi, H. (1993). Queueing Analysis." A Foundation of Performance Analysis, vol. 3. Discrete-Time Systems. North-Holland, New York. Walrand, J. (1988). An introduction to queueing networks. Prentice-Hall International Editions, Englewood Cliffs. Whitt, W. (1981). Comparing counting processes and queues. Adv. Appl. Probab. 13, 207-220. Whittle, P. (1986). Systems in Stochastic Equilibrium. Wiley, Chichester. Williams, R. J. (1995). Semimartingale reflecting Brownian motion in the orthant. In Stochastik Networks, volume 71 of [MA Volumes in Mathematics and its Applications (Eds., F. P. Kelly and R. J. Williams), pp. 125-138. Springer, New York. Williams, R. J. (1996). On the approximation of queueing networks in heavy traffic. In Stochastik Networks. Theory and Applieations, volume 4 of Royal Statistical Society Lecture Note Series (Eds., F. P. Kelly, S. Zachary, and I. Ziedins), pp. 35 56. Clarendon Press, Oxford. Wolff, R. W. (1982). Poisson arrivals see time averages. Oper. Res. 30, 223-231. Woodward, M. E. (1994). Communication and Computer Networks: Modelling with Discrete-Time Queues. IEEE Computer Society Press, Los Alamitos, CA. Walrand, J. and P. Varaiya (1980). Sojourn times and the overtaking condition in Jacksonian networks. Adv. Appl. Probab. 12, 1000-1018. Yao, D. D. (1994). Stochastic Modeling and Analysis of Manufacturing Systems. Springer Series in Operations Research. Springer, New York. Yashkov, S. F. (1980). Properties of invariance of probabilistic models of adaptive scheduling in shared-use systems. Auto. Cont. Com. Sci. 14, 46-51. Zachary, S. (1996). The asymptotic behaviour of large loss networks. In Stochastik Networks. Theory and Applications, volume 4 of Royal Statistical Society Lecture Note Series (Eds., F. P. Kelly, S. Zachary, and I. Ziedins), pp. 193-104. Clarendon Press, Oxford.
1 ")
1
Stochastic Processes in Insurance and Finance
Paul Embrechts, Rfidiger Frey and Hansj6rg Furrer
1. Introduction
1.1. The basic building blocks

Coincidence or not? Though the theory of stochastic processes is very much a theory of the 20th century, its first appearance through applications in insurance and finance shows some remarkable similarities. In 1900, Bachelier (1900) wrote in his famous thesis: "Si, /t l'6gard de plusieurs questions trait6es dans cette &ude, j'ai compar6 les r6sultats de l'observation ~t ceux de la th~orie, ce n'~tait pas pour v6rifier des formules ~tablies pour les m6thodes math6matiques, mais pour montrer seulement que le march6, /t son issu, ob6it ~, une loi qui le domine: la loi de la probabilit6." The title of Bachelier's thesis is "Th6orie de la Sp6culation", the theory of speculation. The main point in the above extract is that Bachelier has shown ("montrer") that financial markets are dominated by the laws of probability. More precisely, the erratic behaviour of stockmarket data were so much akin to the motion of small particles suspended in a fluid that a link to a process studied later among others by Einstein and Schmolukowski was obvious. The link between Brownian motion and finance was born. It would take economists another 50 years to realize the importance of this link; today however, nobody doubts the fundamental nature of this observation. To fix ideas we choose some basic probability space (~2,~ , P) on which all stochastic processes that we introduce in this paper are defined. The first such process is defined as follows. DEFINITION 1.1, Standard Brownian motion W = (Wt)t_>0 is a real-valued stochastic process which satisfies the following conditions: (a) W starts at zero: g~ = 0, a.s., (b) W has independent increments: for any partition 0 _< to < tl < ... < tn < oc and any k, the random variables (r.v.s) Wtl - - Wt0 , W/2 - W I , . . . , Wtk -- W/k_l a r e independent,
365
366
P. Embrechts, R. Frey and H. Furrer
(c) W has Gaussian increments: for any t > 0, Wt is normally distributed with mean 0 and variance t, i.e., Wt N J ( 0 , t), and (d) W has a.s. continuous sample paths. The conditions (b) and (c) are referred to as: W has stationary and independent increments, moreover, the increments are normally distributed. Processes satisfying the stationary and independent increment property, together with a mild sample path regularity condition, are also referred to as L6vy processes, see Bertoin (1996). As we will see later, they play a crucial role in insurance and finance models. The construction of a process satisfying (a)-(d), i.e., proving existence of Brownian motion, is not trivial. A first systematic treatment actually constructing W was given by Norbert Wiener. For a discussion of this, together with a detailed analysis of further properties of W, see for instance Karatzas and Shreve (1988). Though (d) above states that the sample paths of W are (a.s.) continuous, they show a most erratic behaviour, as shown in the next result. PROPOSITION 1.2. Suppose W defined as above, then P - a.s., the sample paths of W are nowhere differentiable. [] A rather unpleasant consequence of this result is that W has unbounded variation on each interval I, say, i.e., sup ~
A i=t
[Wti(co) - Wt,_l(oo)l = oo
for P - almost all co E O, A being a possible partition A = {to,..., t,,} of I and the sup taken over all such partitions. Consequently, standard integration theory for functions with bounded variation does not work, i.e., the symbol 0 t ~(o))dW~(co) , (1)
for some stochastic process (Yt) has no immediate meaning. Though at first, the news on W is bad, there is some hope. Indeed, the following result due to Paul L~vy will be the clue to "defining" (1) in terms of It6 calculus. THEOREM 1.3. (W has finite quadratic variation) Suppose W as above, but change in (c) the normality assumption to: (c') for all t, W, ~ X ( 0 , o-2t), a > 0. Then for n ---+oc,
]Wte
--
~, _ 112
L~- 1 o 2 t
i=1
where {to,..., tn} is an arbitrary partition of [0, t] such that sup/]ti - t i - l l ~ 0 and L2 where ' ~ " denotes convergence in L 2(O, o ~ , P). Under a slight extra condition on the partitions used, L2-convergence can be replaced by almost sure convergence. []
Stochastic processes in insurance and finance
367
Whereas Proposition 1.2 was the "bad news", Theorem 1.3 contains the "good news". It turns out to be the key to a new integration theory with respect to W, giving (1) a meaning at least for so-called predictable integrands. There are various ways to derive Brownian motion as a key building block of financial time series modeling. First of all, just looking at some pictures of financial data reveals the same erratic behaviour as is observed in simulated data from W, see Figure 1 below. Moreover, W is only the first building block. Later in Section 3 we shall investigate more carefully how more realistic models in finance come about. Before we move to the next process, this time born out of insurance modeling considerations, we would like to indicate a further reason why Brownian motion is a natural model-candidate for financial (stockmarket) data. As we know, the normal distribution enters as a non-degenerate limit of normalized partial sums of independent, identically distributed (i.i.d.) r.v.s. The latter is often described (in a process context) as: the value Wt is obtained via a large "bombardment" of small, independent shocks. If we interpret these shocks as small price changes (up, down) coming from many individual trades, it should not surprise us that stockmarket and Brownian motion should go hand in hand. A formal microeconomic approach to diffusion models for stock prices which is based on this idea has been proposed by F611mer and Schweizer (1993). Around the same time as Bachelier was working on Brownian motion as a basic limit model for financial data, in Sweden in 1903, Filip Lundberg published a remarkable thesis (Lundberg (1903)) providing a mathematical foundation to non-life insurance. In his model, the key ingredients were the so-called premiums and claims. The latter he proposed to model through a homogeneous Poisson process as defined below. DEFINITION 1.4. The stochastic counting process N = ( N ( t ) ) t is a homogeneous Poisson process with rate (intensity) ;~ > 0 if:
O3
,1
~o
0.0
0.5
1.0 t
1.5
2.0
Fig. 1. Simulations of standard Brownian motion.
368
(a) N ( 0 ) = 0 a.s., (b) N has stationary, independent increments, and (c) for all 0 _< s < t < o c : N(t) - N(s) ~ POIS(2(t - s)), i.e.
P(N(t) - N(s) = k) = e -'~(t-s) (2(t - s)) k k! '
k E N
(2)
First of all, the above definition shows a remarkable similarity with Definition 1.1 of Brownian motion. Both processes are L6vy processes. The key difference lies in the sample path behaviour: Brownian motion has continuous sample paths, whereas the Poisson Process is as a counting process a jump process (for typical realizations, see Figure 2 below). In insurance applications, N(t) stands for the number of claims in the time interval (0, t] in a well defined portfolio. If we denote the claim arrival of the nth claim by S~, then
N(t)=sup{n>
l:S~<t},
t>0
The inter-arrival times T1, Tk = & -- & - l , k = 2, 3 , . . . are independent, identically exponentially distributed (EXP(2)) with finite mean EY1 = 1/2. The latter property also characterizes the homogeneous Poisson process, see for instance Resnick (1992). The claim size process (Xk)k~ is at first assumed to be i.i.d, with distribution function (d,f.) F (F(0) = 0) and finite mean/2 = EX1. The r.v. Xk denotes the claim size occurring at time S~. In Section 2, various of the above conditions will be relaxed. As a consequence, the total claim amount up to time t is given by S(t) = 2N(t~ Xk. The latter r.v. is referred to as compound Poisson r.v. Though its d.f. can be written down easily,
{~t ~n
Gt(x) = P(S(t) < x) : ~ e - X t ~ F " * ( x ) ,

- - i t . . n=O
x,t >_ 0 ,
(3)
I
O. i r . . . . I i-" "- - - J p i I
1
~--' ,
r I
O2
r~ I
r-a
lr - - ~
I--,q-
["
...........i -i
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
,
tN
I ,. . . . . . . . . . . . . . . . . . . . L,.
__l.J
[ ....
I
2 4
t
Fig. 2. Simulations of homogeneous Poisson processes with intensity 2 = 1.
Stochastic processes in insurance and finanee
369
its precise calculation and indeed statistical estimation in practice form a key area of research in insurance risk theory; see for instance Panjer and Willmot (1992) or Klugman, Panjer, and Willmot (1998) and the references therein. The d.f. F n* in (3) denotes the n-fold convolution of F, F * denotes Dirac measure in 0. Now besides the liability process (S(t))t>_o, an insurance company cashes premiums in order to compensate the losses. In the above standard (so-called Cram+r-Lundberg) model, the premium process (P(t)) t is assumed to be linear (deterministic), i.e., P ( t ) = u + ct, where u _> 0 stands for the initial capital and c > 0 is the constant premium rate chosen in such a way that the company (or portfolio) has a fair chance of "survival". The following r.v. is crucial in this context: denote by z the ruin time of the risk process U(t) = u + c t - S(t), I.e. z = inf{t > 0: U(t) < 0} (5) t >_ O . (4)
(we always assume that inf~ = oc). The associated ruin probabilities are defined as V(u,T)=P(z_<T), T_<oc . (6)
For ~(u, ec), the infinite horizon ruin probability, we write kU(u). It is not difficult to show that, under the so-called net-profit condition c-2/~>0 , (7)
limu-+oo ~U(u) = 0. Within the Cram&-Lundberg set-up, the condition (7) is always assumed, it says that on the average we obtain a higher premium income than a claim loss. The basic risk process (4) can now be rewritten as
v(t) = u +
(1
s(t)
where 2/~t = ES(t) and 0 = c/(2#) - 1 > 0 is the so-called safety loading which guarantees "survival". In Figure 3, we have simulated some realizations of (4) for exponentially distributed claims. DEFINITION 1.5. The stochastic process (U(t)) t defined in (4) with the net-profit condition (7) is called the Cram6~Lundberg risk process. The following result appears in the applied stochastic process literature under various guises (see for instance Resnick (1992) o17 Embrechts, KRippelberg and Mikosch (1997)). We standardly denote H = 1 - H for any d.f. H concentrated
on [0, oo).
THEOREM 1.6. Given the Cram6r-Lundberg model as above, then
oo
1-tP(u)=-(1-p)~pnFzn*(u),
n--O
u>O
(8)
where p = 2#/c < 1 and the integrated tail d.f. F/is defined as
370
g:
tt~
tO
,-"1
4
t
1o
Fig. 3. Simulations of a Cram6r-Lundberg risk process U with initial capital u = ] 5, premium rate c = 2.5, intensity 2 = 1 and exponentially distributed claims with mean p = 2.
x
Fr(x) =
F(y)dy,
x_> 0 .
(9)
[]
The fact that the function ~(u) in (8) also allows a compound d.f. expression (like in (3)) has important analytic, as well as numerical consequences. The sum in (8) is of compound geometric type. Of course, the compound Poisson process with drift as described in (4) is not comprehensive and does not take into account, for example, the nonlinear premium increase of the capital due to possible investment or inflation and dividend payments to stockholders. However, processes of the form (4) are the basic building blocks of any L~vy process Y (without Brownian component) in the sense that Y is the limit (with respect to convergence on compact intervals) of a sequence g(n) of compound Poisson processes with drift. The conditions underlying the Cram~r-Lundberg model are clearly violated in practice. For instance, claims may arrive in clusters. Already early on, actuaries introduced the so-called notion of operational time. The claim-arrival process ( N ( t ) ) t is often more realistically modeled as an inhomogeneous Poisson process with intensity measure A(t), i.e., the process still has independent increments, but for 0 <_ s < t, N ( t ) - N ( s ) ~ POIS(A(t) - A ( s ) ) . This more realistic situation can be reduced to the homogeneous (standard) case via a time-change N ( t ) = N ( A - I ( t ) ) . In the new, operational time scale, N is a homogeneous Poisson process. F o r a discussion of this time transformation, see Btihlmann (1970) and Gerber (1979). More recently, (operational) time considerations are entering stochastic modeling in finance, see e.g., Clark (1973) or Guillaume et al. (1997) and G e m a n and An6 (1996). The latter papers are mainly based on models coming from the tick-by-tick data world. In the above discussion, we have seen that the two most important L6vy processes, Brownian motion and the homogeneous Poisson process, appear right at the beginning of stochastic modeling in finance (Bachelier) and insurance
Stochastic processes in insurance andfinance
371
(Lundberg). It is remarkable that this development took place well before Kolmogorov created his famous axiomatic theory in the early thirties. Before we discuss in the next sections various generalizations of the above basic models relevant for insurance and finance, we wish to digress a little into the realm of martingales. 1.2. Some basic martingale theory Ever since the appearance of Doob (1953), martingales have played a crucial role in probability, so much that in many problems in applied probability the solution could be reduced to "spot the martingale". Below we only give the very basics of martingale theory. All the results, and much more are to be found in the excellent texts Williams (1991), Rogers and Williams (1994, 1987), Karatzas and Shreve (1988), Kopp (1984) and Revuz and Yor (1994). A very readable introduction to the use of martingale methods in insurance is Gerber (1979), for finance Musiela and Rutkowski (1997) is to be recommended. Especially, the notion of conditional expectation E ( X I ~ ) of a random variable with respect to a o--algebra ff is crucial in all that follows. Before we can introduce the fundamental notion of a martingale, we need to formalize the concept of information (history). DEnNmON 1.7. A family I: = (@t)t of a-algebras on (f2, ~ ) is called a filtration if ~ t C ~- for all t _> 0, and for all s < t, ~ c ~ t (i.e., ~: is increasing). A stochastic process (Xt)t is called F-adapted ifXt is Yt - measurable for all t >_ 0. The natural filtration (~-X)of a stochastic process (Xt)t is the smallest filtration such that X is adapted. If a stochastic process X is considered, then if nothing else is mentioned, we use its natural filtration. A filtration is called right-continuous if @t+ := N~>to~ = J~t for all t _> 0. DEFINmON 1.8. An UZ-stopping time T is a random variable with values in [0, oc] such that for all t >_ 0, {T _< t} c ~ t . The o--algebra ~r:={A~:AN{T<t}E~tforallt>0} is called the stopped a-algebra with respect to T. The usual interpretation of the natural filtration ~x _- ( ~ tx) is that ~-x contains all the information available in the r.v.s (X~)s<t. DEFINITION 1.9. A stochastic process M = (Mt)t on the filtered probability space (f2, Y , ~, P) is an ~:-martingale (-submartingale, -supermartingale, respectively) if (a) M is ~Z-adapted, integrable, and (b) for all 0 < s < t : E(Mtl~-~s) ~- (>_, <)Ms, P-a.s. We simply say that M is a martingale (submartingale, supermartingale) if it is a martingale (submartingale, supermartingale) with respect to the natural filtration. The following two results are now key to many applications in insurance and finance.
372
THEOREM 1.10. (Martingale stopping theorem) Let M be an ~Z-martingale (-submartingale, -supermartingale) and T an ~:-stopping time. Assume that ~ is right continuous. Then also the stopped stochastic process (MrAt : t _> 0) is an HZ-martingale (-submartingale, -supermartingale). Moreover, for all t_> 0 = (_>, [] Theorem 1.10 immediately implies the following important relation:
EMrAt = (>_, <_)EMo .
(10)
In various applications, one would like to replace T A t in (10) by T; this result is not true in general, extra (uniform) integrability conditions have to be imposed. The next theorem yields a precise formulation for the often used statement that "all martingales converge". THEOnEM 1.11. (Martingale convergence t h e o r e m ) L e t M be an IZ-super martingale such that supt>0EMt- < oc. If ~ is right continuous, then Mo~ := l i m t ~ M t exists P - a.s., moreover, E I M ~ I < oe. [] An immediate consequence of the above is that positive (or indeed negative) martingales converge almost surely. A third important category of results are the so-called martingale inequalities; we refer to the cited literature for examples of the latter. For our purposes, the following martingales related to Brownian motion and the homogeneous Poisson process are important. PROPOSITION 1.12. (a) Suppose N is a homogeneous Poisson process with intensity 2 > 0, then (N(t) - 2t)t is a martingale. (b) Consider the Cram~r-Lundberg model from Definition 1.5. Let
O(r) = 2(Ee rx~ - 1) - cr ,
(11)
for those r-values for which E e rx~ exists. Then

(M~(t) ) t := ( e x p { - r U ( t )
-
O(r)t} ) t
(12) []
is a martingale.
Together with the stopping theorem (Theorem 1.10) this result yields important information on the probabilities of ruin 7J(u, T), see Section 2. The proof of (12) is fairly easy once we know that ( U ( t ) ) t is a (strong) Markov process: For
O~s<t, E(Mr(t) Ig~) = E(exp{-rU(t) - O(r)t} I@,) = E(exp{-r(U(t) = E(exp{-r(U(t) - U(s))} e x p { - r U ( s ) } l ~ - ~ ) e -(r)t - U(s))}l - ~ ) e x p { - r U ( s ) - O(r)t}
373
= E(exp{r~N=(~(s)+, X/}]ff,)
x exp{-rU(s) - 2(Ee rxl - 1)t + crs}
= exp{2(Ee rxl - 1)(t - s)}

X exp{--rU(s) -- ~ ( E e ryl - 1)t-~-
crs}
= exp{-rU(s)
= Mr(s)
- O(r)s)
In the Brownian case, the following results are easily obtained. PROPOSITION 1.13. Suppose W = (Wt)t is standard Brownian motion, then (a) W and (Wt2 - t)t are martingales, (b) for any /~ ~, a > 0, denote W~,o(t)= # t + aWt, then (W~,~(t)) t is called Brownian motion with drift # and variance 0-2. For each fi E N, the following process (exp{fiW~,~(t) - (#fi + a2 fi2 / Z ) t } )t (13)
is a martingale associated to Brownian motion, called the Wald- or exponential martingale. [] For a nice discussion on how the latter result can be used for deriving properties on models involving Brownian motion see for instance Harrison (1985).
2. Stochastic processes in insurance

2.1. S o m e basic results
In Section 1.1 we introduced the Cram6r-Lundberg model U(t) = u + ct - S(t), t _> 0; see (4). In Proposition 1.12 we derived a whole family of associated exponential martingales parametrised by those r E R for which
rex(r) = Ee rxl is finite. One easily verifies that the function O(r) in (11) is strictly
convex, 0(0) = 0, 0'(0) = 2# - c < 0 so that the situation depicted in Figure 4 may occur. This motivates the following DEEINITION 2.1. (Lundberg coefficient) Suppose the claim size d.f. F allows for a constant R > 0 to exist for which O(R) = 0, then R is called the Lundberg(or adjustment) coefficient of the risk process ( U ( t ) ) t. Typical examples where R exists are the exponential and gamma distributions. However, R does not exist for Pareto or lognormal distributions. Suppose now that the Lundberg coefficient R exists, then by Proposition 1.12, (MR(t) = e x p { - R U ( t ) } ) t is a martingale. Since the ruin time ~ is a stopping time for U(t) we can apply Theorem 1.10, hence for t >_ 0:
374
P. Embrechts, R. Frey and H. Furrer O(r)
j ,
Fig. 4. Visualisation of the function O(r) in (11).
E(exp{-RU(~/~ t)}) = E(exp{-RU(O))) = e -R" .

The left hand side can be bounded below by
E ( e x p { - R U ( r a t)}; v _< t) = E ( e x p { - R g ( ~ ) } ; r <_ t) ,

where, in general, we denote E(X;A) = f A X d P . Using monotone convergence (t --+ oe) and U(r) < 0 we obtain
e -e" > E(exp{-RU(r)}; ~ < o0) > P(r < oo) .

Hence the following, so-called Cram6r-Lundberg inequality is obtained for ruin in infinite time: 7J(u) < e -e" (14)
In order to answer the important question on how sharp the estimate in (14) is, one has to resort to more refined arguments. Using renewal theory results like Blackwell's renewal theorem (in the version of Smith's key renewal theorem; see Resnick (1992)) one obtains: [] THEOREM 2.2. (The Cram6r-Lindberg approximation) Assume that the Lundberg coefficient R in Definition 2.1 exists and that
fo
then
~XemP(x)dx < oc ,
(15)
lim 7'(u)e e" -
c - )4~
(16) []
The limit result (16) shows that the Lundberg inequality is (asymptotically) sharp. The moment condition (15) is satisfied in all standard examples where R exists. For a discussion on this, see Embrechts (1983). The asymptotic estimate (16) is called the Cramdr-Lundberg approximation. A key question in risk theory is to what extent the results (14) and (16) carry over to more general risk models. The
375
important assumption in the Cram6r-Lundberg approximation is that the exponential moments of the claim size distribution exist for some r > 0. This means that the right tail of F decreases at least exponentially fast. However, analysis of insurance and financial data typically indicates the presence of heavy tails; see Embrechts, Klfippelberg, and Mikosch (1997). The main result on asymptotic ruin estimates when the Lundberg coefficient does not exist is based on subexponentiality of Fz. Notice that ~0(u) := 1 - 7~(u) given in (8) is the d.f. of the random geometric sum SN = L1 + " " + LN, where (Li) is a sequence of i.i.d.r.v.s. with common d.f. F1 and N is geometrically distributed with parameter 1 - p, independent of (Li). Now ifFz is "long-tailed", large observations of Li may occur with high probability and it is not unreasonable to conjecture that the random sum SN is governed by just one summand. For that reason, it might be possible to relate the tail behaviour of (p to that of Fi. It turns out that the proper class for this purpose is the class 5P of subexponential distributions defined below. DEFINITION 2.3. A distribution G on [0, oc) with unbounded support belongs to the class 5P of subexponential distributions if
~
lira 1 - G2*(x)
1-G(x)
-2
(17)
To explain why Y can be used to model large claims we reformulate (17) as follows. If (X~) are i.i.d.r.v.s with d.f. G E 5p, then P ( S , > x ) ~ P ( M , > x), x -+ oc. Here we mean by f ( x ) ~ g(x), x ---+ oc, that l i m ~ f ( x ) / g ( x ) = 1, and M, =max(X1,... ,Xn). The name subexponential stems from the following property: if G E 5f, then the right tail of G decreases slower than any exponential, i.e., lim~_~o~e=G(x) -- oc, for all e > 0. A detailed analysis of the class 5p and its application to insurance are given in Embrechts, Klfippelberg and Mikosch (1997). Asymptotic ruin estimates involving the class 5f were proposed in Embrechts, Goldie, and Veraverbeke (1979), where it is shown that the distribution of a random geometric sum L1 + ... + LN belongs to 5f if and only ifLi E 5~, yielding the following result: THEOREM 2.4. In the Cram6~Lundberg model with F~ E 5f and safety loading 0 > 0 one has
1 7~(u) ~ ( u ) ,
u~ ~
(18) []
The main examples of claim size distributions where (18) holds are the Pareto, lognormal and the heavy-tailed Weibull distributions.
2.2. Practical evaluation o f tP(u, T)
In the previous sections we discussed various expressions for the ruin probabilities q'(u) = ~U(u, oc) in the Cram6r-Lundberg model. These results were either exact,
376
see (8), in inequality form (see (14)) or asymptotic for large initial capital u (see (16) and (18)). Alternative techniques may lead to integral-differential equations, see Section 2.3.3 below, Fourier-type representation and, for any of these, specific numerical techniques like the fast-Fourier-transform, simulation, or recursive methods. The most widely used method in the latter category is the so-called Panjer recursion which is based on a discretization of (8); see for instance Panjer and Willmot (1992), p. 171. One particular method for estimating ~(u, T) for finite T is based on a so-called diffusion approximation. We include a discussion mainly because of its relevance for the general theme of the paper rather than for its practical usefulness which is limited. Often, one can imbed a (classical) risk process in a sequence (U (n))n of risk processes and hope for, say, the existence of a reasonable weak limiting process Z. If the risk process U (~) is approximated by the limiting process Z, then, under some regularity conditions, the hitting times (ruin probabilities) of Z should also approximate the hitting times (ruin probabilities) of U@. A Cram6r-Lundberg risk process U is cfidlfig, i.e., it has sample paths which are right continuous with left limits. (The word "cgMldg" is an acronym from the French "continu fi droite, limites fi gauche"). Stone (1963) extends the Skorokhod Jl-metric for cfidlfig functions on compact intervals to if? ~- [0, oc), making a Polish space. Hence, we can talk of weak convergence in D. DEFINITION 2.5. A sequence (X("))~ of stochastic processes in D = D[0, oc) is said to converge weakly in the Skorokhod dl-topology to a stochastic process X if for every bounded continuous functional f on D it follows that lim E ( f ( X @ ) ) ) = EOC(X)) .
n --+ o 0
In this case one writes X @ ~ X, n ~ co. The main ingredients for weak approximations in risk theory are a functional central limit theorem in conjunction with the continuous mapping theorem. Suppose that (Xk) is a sequence of ii.d.r.v.s, with mean # and finite variance o-2. The famous Donsker invariance principle then says that, on [0, 1],
z,(.)
1
:=
b-]
n .
v,N@t) where W denotes standard Brownian motion. The process 1/(o.v~)z_~k= 1 (Xk - #) is a random time transformation of Z~, i.e.,
I O.V/~ k=l
=z.
Moreover, N ( n t ) / n ~ M, where I denotes the identity map. The composition mapping is continuous, implying that
Stochasticprocessesin insuranceandfinance
1 ~(X~ - #) ~ W(~.) ~ v/2W((.) .
377
(19)
The last equality in law follows from the scaling property of Brownian motion. Relation (19) is the key to the diffusion approximation in risk theory, which was first introduced in insurance mathematics by Iglehart (1969), see also Grandell (1991, Appendix A.4) for an extensive discussion of the method. The diffusion approach yields approximations for 7J(u, T) as well as for ~P(u), namely
7~(u'T) ~ P(ko<,<rinf(u + 2#Os + v~W~) < 0)

_-
~(',~u~r\ ~ + ,) + e_2,o,~/.(;~ff0r\ ~2T "-/'2 ]

k
~P(u) ~ P(inf(u + 2##s + ~/2W~) < O) = e-2#~u

ks>_0 where ~ denotes the d.f. of a standard normal r.v. The results in the above equalities can for instance be found in Borodin and Salminen (1996). The latter approach is called diffusion approximation since Brownian motion is a special diffusion process. One of the advantages of the diffusion approximation is that it is applicable to more general models which derive from the classical risk process. For these more general processes the classical methods from renewal theory usually fail, and the diffusion approach is then one of the few tools that work. Brownian motion has been studied for a long time and its usefulness in stochastic modeling is well accepted. However, Gaussian processes and variables do not allow for large fluctuations and may sometimes be inadequate for modeling high variability. For instance, the above diffusion approximation does not apply when the observed data give rise to a heavy-tailed claim size distribution such as Pareto with shape parameter 1 < e < 2, implying that the variance 2 does not exist. This phenomenon very often arises in non-life insurance and in particular in reinsurance; see Embrechts, Kliippelberg and Mikosch (1997). Both stable r.v.s and stable processes arise naturally as alternative modeling tools. The class of stable laws is defined as follows: DEFINITION 2.6. A r.v. X is said to have a stable distribution, if for any n > 2 there is a cn > 0 and a real number dn such that
xl +... + xn ~nx + d~ ,
where the X,- are independent copies of X.
(20)
It turns out (Feller (1971)) that in (20) we have necessarily cn = n l/~ for some E (0, 2]. The parameter ~ is called index of stability. The case a = 2 corresponds to the normal distribution. Stable laws share many properties with the Gaussian distribution. In particular, we may think of the central limit theorem: only stable laws appear as weak limits of normalized sums of i.i.d.r.v.s. The main difference
378
between the normal distribution and non-Gaussian stable distributions is the tail behaviour. The (upper) tails of the latter decrease like kx-~, x -+ ec, for some constant k. The smaller the value of e, the slower the decay and the heavier the tails. We now introduce another class of L6vy processes which contains Brownian motion as a special case. DEFINITION 2.7. A c~idl/tg process Z is said to be an e-stable L6vy motion if the following properties hold: (a) Z0 = 0 a.s., (b) Z has independent, stationary increments, and (c) For every t, Zt has an e-stable distribution. Notice that 2-stable L~vy motion is Brownian motion. A stable L6vy motion with parameter e < 2 exhibits jumps whose directions are governed by a so-called skewness parameter fi c [-1, 1]. If lfll = 1, the L6vy measure is concentrated on a half line and consequently there are only jumps in one direction. Figure 5 depicts some simulations of e-stable L6vy motion. The analogous, powerful result to the Donsker invariance principle in the regime of heavy-tailedness is a stable functional central limit theorem: Suppose that (Xk) is a sequence of i.i.d.r.v.s with finite mean # and such that ~(Xk-#)~Y, ~0(n) k=l
1 "
,--+oe,
where (p(n) = nl/~L(n) for an appropriate slowly varying function L and Y has a stable distribution with index 1 < e < 2 and skewness parameter I/~t _< 1. Then, for0<t< 1,
e,I
'7,
0.0
0.2
0.4
I
0.6
0.8
1.0
Fig. 5. Simulations of 1.2-stable L~vy motion (fi = 0).
379
1 [n'] (p(n) k=l

-
where Z denotes e-stable L~vy motion with index c~ and skewness parameter ft. Moreover, Z1 ~ Y. Following the same approach as in the Brownian diffusion approximation, it is suggested to use the following approximations for the ruin probabilities when the variance of the claim size distribution does not exist:
7J(u,T) ,~ P ( inf ( u + 2#Os + ~l/~Zs) < O)

kO<_s<T
7J(u) ~ P(inf(u + 2l~Os + 21/~Zs) < O) = ~-~ (-a#~)n u ~

+ &n)
'
where ~ = e - 1, a = I cos(Tee/2)] and r(x) = f o e-Uu~-ldu denotes the G a m m a function, see Furrer, Michna and Weron (1997) and Furrer (1998). In the latter reference an explicit formula for the distribution of the infimum of an e-stable L6vy motion with linear drift is derived in terms of the so-called Mittag-Leffler function Eo(x) = ~ _ o x n / C ( 1 + an), a > O.
2.3. Generalizations of the claim number process

One can think of various generalizations of the classical risk process in order to obtain a more reasonable description of reality. Note that the homogeneous Poisson process is a stationary process, implying that the size of the portfolio can not increase (or decrease). In addition, fire and automobile insurance for instance ask for models allowing for risk fluctuations. As already mentioned in Section 1, the simplest way to take size fluctuations into account is to consider inhomogeneous Poisson processes with intensity measure A(t). The purpose of this section is mainly to discuss the choice of point processes describing such risk fluctuations.
2.3.1. Mixed Poisson processes

DEFINn'ION 2.8. Let N be a homogeneous Poisson process with intensity 1 and A a random variable with P(A > 0) = 1, independent of N. Then the process
N = N o A = (N(At)) t
is called a mixed Poisson process. The random variable A is called structure variable. A mixed Poisson process has stationary increments, however the independent increments condition is violated. The stochastic variation of the claim number intensity can be interpreted as random changes of the Poisson parameter from its expected value 2. The most common choice for the distribution of the structure variable A is certainly the gamma distribution whose density function is given by
380
fA(x) = ~(7)x~-le -~x,
x >_ 0 .
(21)
We use the notation A ~ F(7, 6) to indicate that the random variable A has a gamma distribution with density function given in (21). DEFINITION 2.9. A mixed Poisson process N is called a negative binomial process or Pdlya process if A ~ F(7 , 6). We then have for a Pdlya process N
P(N(t) = n) = P(N(At) = ,) =
=
P(N(At) = n[A = 2)fA(2)d2

e-~t ()~tl" F(7) 2V-le-~'~ d2
fo ~
6v
i.e., N(t) has a negative binomial distribution. The corresponding risk model is also known as the P61ya-Eggenberger model. If one compares the total claim amount up to time t for the Poisson model, then for equal means the variance of the Pdlya model is bigger than in the Poisson model. This phenomenon is referred to as over-dispersion and is often encountered in real insurance data, see for instance Seal (1978). From a purely mathematical point of view, ruin calculations in the mixed Poisson case are easily performed. The idea is to first condition on the outcome of A and then weight over ruin probabilities computed in the Poisson case. Let T(u, 2) be the infinite-time ruin probability when N is a homogeneous Poisson process with intensity 2. Observe that ~(u, 2) = 1 when the net profit condition (7) is violated, i.e., when 2 > c/#. Thus we can write
T(u) =
do
~P(u, g) dFA(g) + 1 - FA(c/#) ,
(22)
where FA denotes the d.f. of the structure variable A. It follows from (22) that T(u) >_ f'A(c/#) > 0 for all u. This implies that any insurer who does not constantly adjust his premium rate c according to the the risk fluctuations runs a large risk of being ruined. Assume now that there exists gl < c/# such that FA (gl) = 1. It is natural to let gl be the right endpoint of FA, i.e., gl = sup{g : FA(g) < 1}. It follows from (8) that
OO
n=O
where pn = f~o1(1 - g#/c) (g#/c)n dFA (g). An extension of the Cram6r-Lundberg approximation in the mixed Poisson case in general seems not possible; see
381
Grandell (1997) for a discussion on this. However, the situation is different in the regime of heavy tails, where the following result can be derived, see Grandell (1997): THEOREM 2.10. Let gl be the right endpoint Of FA and suppose that g~ < c/# and that Fz E 5p. Then
where 0(2) = c/(2#) - 1.
[]
2.3.2. Cox processes

We shall now consider the case where the occurrence of claims is described by a Cox process N. The first treatment of Cox processes in insurance mathematics originates from Ammeter (1948). Cox processes seem to form a natural class to model risk and size fluctuations. DEFINITION 2.11. A stochastic process A = (A(t)) t with P - a.s. A ( 0 ) = 0, A(t) < ~ for each t < cx~ and non-decreasing sample paths is called a random measure. I f A has P a.s. continuous realizations, it is called diffuse.
DEFINITION 2.12. Let A be a r a n d o m measure and .~ a homogeneous Poisson process with intensity 2 = 1, independent of A. The point process N = N o A is called Cox process or doubly stochastic Poisson process. Definition 2.12 is one of several equivalent definitions. Strictly speaking we only require that N and N o A are equal in distribution. For this question and related measurability conditions we refer to Grandell (1976). N o w let A be a diffuse random measure with A(cx~) = ~ a.s. and N be the corresponding Cox process. As a generalization of (4) we obtain the risk process N(t)
U ( t ) = u + ( l + O ) # a ( t ) - ~-~X~,
k=0
t>_O .
Assume now that A has the representation A(t) = f~ 2(s)ds, where (2(t)) t is called the intensity process. If (2(t)) t has right continuous and Riemann integrable trajectories, then the corresponding Cox process is well defined (Grandell, 1976). The premium rate is then given by c(t)= (1 + O)#2(t), i.e., it is a stochastic process. The martingale approach to Cox models, due to Bj6rk and Grandell (1988), is an extension of the basic martingale approach considered in Proposition 1.12. See also Embrechts, Grandell and Schmidli (1993) for a discussion on finite time ruin probabilities in the Cox case. Let N be a Cox process with intensity p r o c e s s ()~(t)) t. A suitable filtration ~ is given by f i t = ffA~ V @tU. It seems natural to try to find an aZ-martingale as close as possible to the one used in Proposition 1.12. We therefore consider the process
382
P. Embreckts, R. Frey and H. Furrer
M~(t) = e x p { - r U ( t )
- O(r,t)},
t _> 0 ,
(23)
where O(r, t) = A ( t ) ( E e ~x~ - 1) - r c t , i.e., we simply replace 2t by A ( t ) . Then the following proposition holds: PROPOSITION 2.13. The process (M~(t)) t given in (23) is an IZ-martingale, where the filtration Dris given by 5 t = 2Ao~ V o~tu. [] A lower bound for the ruin probability 7J(u) is easily obtained in the same way as in Section 2.1, namely
Mr(O) = e -~" >_ E ( M ~ ( ~ A t); T _< tlY0 )
= _<
tl -o)
<_ tl@o) .
> E(exp - O ( r , r); r _< t[Y0) > inf e x p { - O ( r , s ) } P ( r

_
O<s<_t
Taking expectations on both sides and using monotone convergence yields P(z < oo) _< e-~"E(sup e x p { A ( t ) ( E e rx~ - l) - r c t } )
t>0
= C ( r ) e -'~ ,
say. Like in the Poisson case we would like to choose r as large as possible. This suggests the following definition: DEFINITION 2.14. The Lundberg coefficient R c in the Cox model is defined as
R c = sup{r: E(sup e x p { A ( t ) ( E e rx~ - 1) - r c t } ) < o c }
t_>0
Consider a Cox model where the intensity process is stationary with E2(t) = 2 and denote by R p the Lundberg coefficient in a classical risk model where the homogeneous Poisson process N has intensity 2. Let r > Rp, implying that O(r) = 2(Ee "xl - 1) - rc > 0 and therefore C(r) _> sup E(exp{A(t)(Ee rxl - 1) - r c t } )
t>0
>_ s u p e x p { t ( 2 ( E e rx* - 1) - r c ) } = oc ,
t>O
where the second inequality follows from Jensen's inequality. Hence R c <<_R e which means that the stationary Cox case is "more dangerous" than the Poisson case. For a more detailed discussion of this comparison we refer to Theorem 22, p. 95 and to Section 4.6 of Grandell (1991). A very special class of Cox models are the Cox processes with an independent jump intensity. Intuitively, an independent jump intensity is a jump process where the jump times form a renewal process and where the value of the intensity between two successive jumps may depend only on the distance between those
383
two jumps. Although Cox processes with an independent jump intensity are a special class of Cox models, they are still general enough to obtain non-trivial models allowing for fairly explicit results in the ruin type setting. Cox processes also appear as limiting processes of certain thinning procedures and therefore seem to be natural point processes for modeling claim arrivals. If we consider claims which are caused by "risk situations" or incidents, then each incident becomes a claim with probability p independent of all other incidents. Under these assumptions, the claim number process is the result of a thinning procedure of the incident number process. A rigorous treatment of Cox models is to be found in Grandell (1991). The book Rolski, Schmidli, Schmidt, and Teugels (1998) gives a readable introduction to risk theory overall.
2.3.3. Renewal processes In this section we let the occurrence of claims be described by a renewal process N. Denote by Tk the interarrival times between two successive claims.
DEFINITION 2.15. A point process on R+ is called a renewal process if the variables (Tk)k>l are independent and if T2, T3,... have the same d.f.G. N is called ordinary renewal process if T1 also has d.f.G. We call N a stationary renewal process if G has finite mean 1/2 and if the d.f. Go of T1 satisfies Go(x) = 2 f o G(s)ds. The first treatment of ruin problems when the occurence of claims is modelled by a renewal process is due to Andersen (1957). Ordinary renewal processes Let N be an ordinary renewal process and assume that Tk has finite mean 1/2. N is not stationary, and EN(t) 2t unless Tk has an exponential distribution. We ( S 0 = 0 ) , where consider the associated random walk S n = ~ = ] Y k , Yk = --cTk + Xk. We assume that EYk = --c/2 + # < 0, implying that the random walk Sn drifts to - o o . The safety loading 0 is defined in a natural way as = c/(,~tt) - 1. Since ruin can occur only at renewal epochs, we have that ~U(u)= P ( r n a x S ~ > u ) . Denote by K the d.f. of Yk and let ~ be the mean of Yk, i.e., ~ = --#0. Then
EerS~=(Eerr~)~=(Ee-rCrl, EerX~)n
We assume that K(0) < 1 since K(0) = 1 implies T(u) --- 0. The function/~(-r) will be important. Under the assumption that the appropriate exponential moments of Xk exist for some r > 0, one can show that/~(0) = 1,/c'(0) = ~ < 0 and that /~ is convex and continuous on [0, r~), where r~ denotes the abscissa of convergence of Ee rxk. Moreover,/c(-r) --+ co as r -+ r~o. F r o m this it follows that
384
there exists a constant RR > 0 such that /c(-RR) = 1. Again RR is called the Lundberg coefficient. Indeed, if Tk is exponentially distributed, then RR coincides with the Lundberg coefficient from the classical model. The process (Sn) is a random walk and therefore has stationary and independent increments. This is exactly the property we used in the classical case to construct a family of martingales and to prove the Lundberg inequality (14). It is therefore not surprising that the derivation goes through in the ordinary renewal setup. PROPOSITION 2.16. The discrete time process (Mr(n)) n given by
e-r(u-&) Mr(n)-,, n >_ O ,
is a martingale with respect to the filtration ~:s given by ~n=z-s ~(Sk : k _< n). Let Nu be the claim number causing ruin, i.e., N, = min{n : & > u}. Then Nu is a stopping time and 7S(u) = P(Nu < oc). Again Nb/A no is a bounded J-stopping time for no < oc and by the stopping theorem for martingales (Theorem 1.10) we obtain as before
~ ( u ) <_ e -ru
sup ]c(--r)
n>_O
The best choice of r is the Lundberg exponent Re, yielding kU(u)<e eR~, u_>0 . (24)
Asymptotic estimates for 7S(u) as in the Cram&-Lundberg approximation can be derived by means of renewal and random walk theory. Consider the r.v. Ai=&v0 on {N0<oc}, where N 0 = m i n { k : & > 0 } . Define A ( y ) = P ( A 1 < _ y, No < oc) and note that A(oc) = P(No < oc) = ~(0). Thus A has a defective distribution. The defect 1 - A ( e c ) is the probability that the random walk never becomes positive starting from 0. By separating the cases A1 > u and A1 __ O ,
(25)
which is a defective renewal equation. By the so-called Esscher transform defined below one can remove the defect, provided the appropriate exponential moments exist. Assume that there is a constant t such that
foo ~ e'~YdA(y) = 1 .
Then we get, by multiplying (25) with e ~b/, a proper renewal equation and Smith's key renewal theorem again yields, for nonarithemtic A,
385
lim e~"7~(u) = 1 - A(cc) u--+~ 1<f o Y e'~ydA (Y) Using random walk theory (see for instance Feller (1971)) one can show that = RR, and we obtain ~V(u) ~ 1 -A(oo) e -~u, RR f o YeRRYdA(y) u --+ oo , (26)
= Co e-~u ,
say. Since A is in general unknown, the constant Co cannot be calculated explicitly. However, it follows that RR is the "right" exponent in (24) and this is undoubtedly the most important consequence of (26). If the claim size distribution is such that F1 E 52, then the following proposition holds, see Embrechts and Veraverbeke (1982). PRoposmoy 2.17. If the claim size distribution is such that F/C 52, then the ruin probability in the ordinary renewal model satisfies
~P(u) ~
FI(u),
u---+ o c
[]
Stationary renewal processes Ruin type estimates for the stationary renewal model basically derive from the ordinary situation. Indeed, by conditioning on the first claim epoch T1 (with d.f. Go) the process starts anew with i.i.d, interarrival times (Tk), hence we are then in the situation of the ordinary renewal model. To make the above heuristic reasoning mathematically precise, denote by US(u) the ruin probability of a stationary renewal model and by 7'(u) the ruin probability of the ordinary model. Then a renewal argument yields
~pS(u) = c
F(y)dy + -c
~(u - y)ie(y)dy .
(27)
Here, as before, F denotes the claim size d.f. For a detailed derivation of (27) see Section 3.2 of Grandell (1991). From the ordinary renewal model we know that 7*(u) <_ e -~u, yielding
7~S(u) <_ _2
c
e_RR(u_y)p(y)dy
RRc
hence Lundberg's inequality holds, but the constant may be greater than one. A Cram~r-Lundberg approximation follows from (27) by multiplying ~pS(u) by eRR u and taking the limit as u --+ co:
386
2[ u l i m e eRuU s (u) = !im eRR(u-Y)~P(u - y)eRRYF(y)dy

U---~O(3 U 0(3 C J 0
= 2C0 [oo eRRYF'(Y)dY c J0

z
RRC
where 0 < C < oc; a result due to Thorin (1975).
2.4. A general insurance risk model
To stress further why martingales play an important role in risk theory we consider the general structure of a risk process
u + P(t) - S(t) ,
where u denotes the initial capital, P the premium income up to time t and S the liabilities (claims). If, for the moment, we forget about the initial capital u and assume that S(t) is a general stochastic process, then a natural way to construct the process P is to make the difference
M ( t ) -- P(t) - S(t)
a "fair game" (i.e., a martingale) between the insurer and the insured. Delbaen and Haezendonck (1987) use this direct martingale approach for the construction of fairly general risk models allowing for economic factors such as interest and inflation to be incorporated to the classical Cram6r-Lundberg model. Paulsen (1993) goes one step further and allows the economic factors to be stochastic. Semimartingales coupled with integro-differential equations lead in some cases to exact probabilities of ruin and in others to inequalities. Economic factors and their influence on ruin probabilities for the Brownian diffusion approximation of a classical risk process are discussed by Serensen (1996) or Norberg (1999). In this section we present a martingale approach based on the theory of piecewise deterministic Markov processes (PDMPs). The class of PDMPs was introduced by Davis (1984) and further discussed in Davis (1993). To motivate the usage of PDMPs in risk theory consider the basic Cram6r-Lundberg model. Note that the state space of a classical risk process (U(t)) t is ~ and that the sample path behaviour of U has a deterministic inter-jump evolution along linear trajectories with rate c > 0. In the language of Davis (1984) one reformulates the latter as "(U(t)) t follows the integral curves of the vector field )~ = c~/~x". Moreover, the hazard rate along integral curves is 2(x) = 2 and the Markov measure governing the stochastic evolution of the process equals Q(dy, x) = dF(x - y ) . Dassios and Embrechts (1989) employ the P D M P framework for solving insurance risk problems where borrowing money below a certain surplus barrier is allowed. All these processes share the property that they are PDMPs for which $ ( t , y ) will denote the integral curve of a vector field Z starting at y C ~. The efficiency of
387
PDMPs in risk theory is strongly based on martingale methodology. For a general Markov process, the martingale construction can effectively be obtained via the integration of the infinitesimal generator along sample paths of the process. DEFINITION 2.18. Let ~ ( d ) be the set of all measurable real functions on N with the property that an operator d exists such that d f is almost surely Lebesgue integrable and
Mi := f ( X t )
- f ( X o ) - foot d f ( X s ) d s
is a local ~t-martingale. We call d the extended generator of (Art) and ~ ( d ) the domain of the generator d . When ~4 corresponds to the infinitesimal generator of a PDMP (Xt), then Davis (1984) gives necessary and sufficient conditions for a function f to belong to ~ ( d ) . However, for applications in risk theory, it turns out that the following condition from Dassios and Embrechts (1989) suffices. Denote by Si the time of the ith claim. LEMMA 2.19. Let f : R ~ N be a measurable function satisfying (i) for all x E R, the mapping t ~ f ( O ( t , x ) ) from [0, oc) to ~ is absolutely continuous, (ii) for all t >_ O, E(~s,<_ t If(Xs,) - f(Xs,-)l) < oo. Then f E @ ( d ) and the generator of the PDMP (Xt) is given by
xgf(x) = 7f(x) + 2
( f ( x - y) - f ( x ) ) d F ( y )
.
[]
Furthermore, (Mf)t is a martingale.
The idea now is to construct martingales via functions f0 E ~(sg) satisfying d f o = 0, implying that fo(X,) - f o ( X o ) is a martingale for bounded f0. A risk model with interest structure As a generalization of the classical model, assume that a company can borrow money if needed (i.e., for a negative or low surplus) and gets interest for capital above a certain level A, say, the amount of capital the company retains as a liquid reserve. The interest rates are assumed to be constant and denoted by /?1 for invested money and/?2 for borrowed money. The associated vector field becomes { ( / ~ ( x - A) + c ) ~
Z~ C~x (/?2x + ~
A<x, 0<x<A,
-
x < 0
The integral curve corresponding to X is decreasing for x <_ -c//72. Whenever the process hits the boundary -c//72, the company will a.s. not be able to repay its
388
debts. So z := inf{t > O:Xt <_--C/fl2} will be called the ruin time. Above the liquid reserve level A the paths are exponentially increasing. Between 0 and A their behaviour is as in the classical case and below 0 the slopes of the paths are smaller. The model where A = oo was studied in Dassios and Embrechts (1989). Using P D M P theory, one can show that for A c [0, o0] one has
< o.) = 1 f(0) f(o )
'
where f is the solution of complicated integral-differential equations. Moreover, P(z < o0) = 1 if and only if A = oc and c < 2#. As a special case, consider exponentially distributed claims with mean /1. Then the function f above becomes
f(x)
where
= f l (x)fl[~,~) (x) + f2 (X)H[O,A) (x) + f3 (X) l(-~,O) (x),
f3 (x) = K
s(:4p2) fo x+C/~2
1e-S/Uds,
- ~ (1 - e - ~ ) , f2(x) = /3(0) ' -J~(0) f l (x) = fz(A) +
e~/(~'~)f~(A)f
dc/fl~
s(2/P~)-le-S/"ds ,
for some constant K which can be calculated explicitly. Here R = 1 / # - 2/c denotes the Lundberg coefficient for exponentially distributed claims in the Cram6r-Lundberg model. As a consequence of this result one obtains the following adjustement coefficient estimate: lim
"-'~
P(z <
oc)e TM =
0 c
cc
r < 1/# or (r = 1/# and )o < ill), r = 1/# and 2 = ill, otherwise ,
where
=
An "extended" P D M P framework also allows to consider ruin type problems of the following model
u (t) = . + e t N(t) ) rk +
k=l
where u and c are constants, N is a claim number process and W is standard Brownian motion describing small perturbations around the risk process U, see Furrer and Schmidli (1994). Finally, an interesting application of the PDMPmethodology to a health-insurance problem is to be found in Davis (1993), p. 107.
Stochastic processes in insurance andfinance 2.5. Remarks on the use of stochastic processes in insurance
389
The above sections have only highlighted some (definitely from a historical perspective the most important) ways in which stochastic processes enter as key building blocks in the stochastic modeling of insurance. Although it was not stated explicitly, it should be clear to the reader that the models that have been treated so far refer mainly to non-life and re-insurance. A very important field of applications, which increasingly sees stochastic modeling being used, is life-insurance. One of the main reasons for this is the increasing convergence of insurance and finance, both structually, i.e., at the company level, and at the level of products being offered. Think for instance of the so-called equity-linked life products where the payment due at the end of the policy is partly contingent on the returns (performance) of an equity portfolio. On the other hand even standard life products are increasingly being modeled by finite-state Markov-processes. It is impossible for us to enter into some of the models in this area in the course of this paper. We refer the interested reader to Wolthuis (1994) and Norberg (1991, 1995), for a start. The monographs Koller (2000) and Milbrodt and Helbig (1999) give an excellent overview of stochastic processes in life-insurance mathematics.
3. Stochastic processes in finance

3.1. Pricing and hedging o f derivatives: Standard theory 3.1.1. Introduction
We start our discussion of stochastic processes in finance by a review of the standard approach for the pricing of derivative securities such as options. Our exposition is based on F611mer (1991) and Frey (1997). Modern derivative asset analysis has its origins in the seminal papers Black and Scholes (1973) and Merton (1973). A few years later it was given an almost definitive conceptual structure by Harrison and Kreps (1979) and Harrison and Pliska (1981). These papers show that the natural mathematical framework for the analysis of derivative securities is provided by the theory of martingales and stochastic integrals. The theory of stochastic integration had been developed by probabilists long before its applicability to Finance was discovered, starting with the fundamental work of It6 and culminating in the "general theory" of the French School. A brief history of stochastic integration theory is provided in Protter (1992). For our exposition we consider a market with two traded assets: a riskless asset B representing some bond or money market account and a risky asset which will be called the stock. The price fluctuations of stock and bond will be described by some stochastic process St(co) respectively Bt(co) on our underlying probability space (~2, ~ , P). For simplicity we assume that Bt = 1 for all t >_ 0. This assumption does not exclude nonzero interest rates from our analysis, if we interpret S as forward price process of the stock, i.e., if we choose the bond as numeraire.
390
To complete the description of our setup we have to specify the information that is available to our financial decision makers at a particular point in time. As in Section 1.2 this is done via a filtration (~t)t; it is understood that at time t agents have access to the information contained in ~ t . We will always assume that our stock price is adapted and that its trajectories follow cfidlfig sample paths. N o w imagine an investor such as an investment bank who considers selling a contingent claim, i.e., a ~-r-measurable random variable H. In this context H is interpreted as payoff of some financial contract which occurs at the maturity date T. Typically, H is a derivative asset, i.e., the value of H is determined by the realization of the price path of S. The most popular examples are European call and put options with maturity date T and exercise price K, where H = (St - K) + or H = ( K - St) +, respectively. More complicated contracts are also traded nowadays; as an example, we mention the so-called average option where H = ( 1 / T f ~ S ~ d s - K ) +. The c o m m o n feature of all these contracts is that the payoff H is unknown at t = 0 and therefore constitutes a risk for the seller. Hence, two questions arise for our investor: H o w should he price the claim and how should he deal with the risk incurred by selling the contract? The " m o d e r n " answer to these questions dates back to the seminal papers Black and Scholes (1973) and Merton (1973), where it was shown for the first time that under certain assumptions the payoff of a derivative security can be replicated by a dynamic trading strategy in the underlying asset, such that its risk can be eliminated. This concept of dynamic hedging, and not some particular pricing formula, is actually the major contribution of these papers. 3.1.2. A two-period example We start by explaining this idea in a very simple two-period setting which represents for instance one time step in the binomial model of Cox, Ross and Rubinstein (1979). Suppose that the current price of S is given by So =- 150 and that there are two possible "scenarios" for the future stock price: the price of S at the terminal time T could be Sr = 180 (with probability p > 0), or be equal to Sr -- 120 (with probability 1 - p > 0). Consider a European call option with p a y o f f K = 140. We claim that a fair price of this option is given by Co = 20, and that this price is moreover independent of the probability p. To justify this claim we construct a portfolio in stock and bond whose value at T equals the price of our option: At t = 0 we buy (2/3) units of the stock and sell 80 bonds. At t = ir there are two possibilities for the value Vr of our portfolio. Sr = 180: In that case Vr equals Vr = (2/3)180 - 80 = 40. Sr = 120: In that case the option is worthless; moreover we have Vr = 0. In either case, the value of our portfolio at T equals the payoff of the option. Hence the fair price of the option should also equal the value of our portfolio at t = 0 which is given by V0 = (2/3)150 - 80 = 20. Otherwise either the buyer or the seller could make some riskless profit. To construct the hedge portfolio in this
Stochasticprocessesin insuranceandfinance
391
simple two-period setting we have to consider two linear equations: Denote by and r/the number of stocks and bonds in our portfolio at t = 0. For our portfolio to replicate the option we must have 4180+fl=40 and ~ 1 2 0 + f l = 0 ,
(28)
which leads to the above values of ~ = 2/3 and t / = -80. Note that the probability p did not enter our argument; this probability mattered only in so far as the requirements P(Sr = 180) > 0 and P(ST = 120) = 1 -- p > 0 determine the set of possible scenarios at t = T. Nonetheless, it is still possible to compute the fair price of the option as expected value of the terminal payoff under some "artificial" probability measure Q which turns the investment in the stock into a fair game (a martingale). In our case, such a probability measure is unique and given by Q(Sr = 180) - Q(ST = 120) = 0.5. If we now compute the expected terminal value (under Q) of the terminal payoff of our option we get
E Q ( s r - 140) + = ( 1 / 2 ) 4 0 + (1/2)0 = 2 0 .
(29)
This is of course not a lucky coincidence and the general argument justifying (29) will be given in the next section.
3.1.3. The general argument

We now extend the argument from the previous two-period example to a more realistic continuous-time setting. Our basic assumption is that the process S admits an equivalent local martingale measure Q, i.e., a probability measure Q ~ P such that S is a Q-local martingale. We will comment on the economic meaning of this assumption below. From a mathematical viewpoint this assumption ensures that S is a semimartingale under P such that we may define stochastic integrals with respect to S. Recall that a semimartingale X is an adapted cfidlfig process which can be decomposed as Xt = X0 + Mt + At, where M is a local martingale and A is a process of finite variation. If A is predictable (e.g., leftcontinuous) such a decomposition is unique. Semimartingales are natural stochastic integrators; a good treatment of semimartingale theory and in particular of their role as natural stochastic integrators is given in Protter (1992). To replicate the payoff of a contingent claim we use a dynamic trading strategy (4, ~) where it gives the amount held in the risky asset at time t and t/t gives the position in the bond. Of course our position at t should depend only on information available up to time t, that is we require ~ to be predictable and r/to be adapted with respect to our filtration; ~ should moreover be locally bounded. We refer the reader to Chapter 4 of Protter (1992) for a formal definition of predictable processes and mention only that every adapted and left-continuous process is locally bounded and predictable. At time t the value of our hedge portfolio equals
v, = ,st + .
(30)
392
As Bt -- 1 the cumulated gains from trade of following this strategy up to time t are measured by the stochastic integral f0 4, d&. This is obvious for so-called simple predictable strategies { of the form
n
{t=Z~il(~,~+~](t) ,
i=l
where 0 = To < T1 < ... < Tn+l < oc is a finite sequence of stopping times and where each d.i is ~,~-measurable and bounded. If we follow such a strategy the gains (or losses) from trade up to time t are given by
n ~i(STi+lAt
i=l
-- STiAt)
~-
f0' ~S d&
by definition of the stochastic integral for simple predictable processes. For general strategies the modeling of the gains from trade as a stochastic integral can be justified by limit arguments. The cumulative cost Ct from following this strategy up to time t is given by Ct = Vt - V0 {,dS, . (31)
It measures the cumulative in- or outflows to our strategy. The strategy will be called selffinancing if the cumulative cost is zero, i.e., if Vt=V0+
/0'
~.d&
for a l l 0 < t < T
(32)
Suppose now that our contingent claim can be represented as a stochastic integral with respect to S, i.e., H = H0 + for ~ d&. Then we may construct a dynamic hedging strategy for H as follows. Define it = ~ and t/t ::= H0 +
/0'
~ d& - ~ S t
(33)
This strategy is selffinancing with value process V/v= H0 + ,/0 ~-~ d&. In particular Vr ~ = H. Therefore, at any time t _< T we can replicate the claim by starting with an investment of Vt H and following the above strategy. There are no further payments and hence no further risk. This implies that at time t the fair price of the claim should be equal to V, B. Harrison and Pliska (1981) showed how the fair price of the claim can be computed using the concept of martingales. The stochastic integral f0 {~ d& is a Q-local martingale and a martingale under some uniform integrability assumptions. Hence EQ
(S T{~d&l~-t ) = 0
for a l l t .
393
This yields the so-called risk-neutral pricing rule for the claim H
Ht := Vt H = EQ(HI@t) ;
(34)
in particular the fair price process H = (Ht)0<t<r is a Q-martingale. Harrison and Pliska (1983) moreover showed that the market is complete, i.e., every Q-integrable claim admits a representation as stochastic integral with respect to S, if and only if there is only one equivalent martingale measure for S. The assumption that S admits an equivalent (local) martingale measure needs of course some economic justification, which is provided by the so-called "First Fundamental Theorem of Asset Pricing". This theorem, whose origins go back to the work of Harrison and Kreps (1979), states that the existence of an equivalent martingale measure is "essentially equivalent" to the absence of arbitrage opportunities. As a precise mathematical statement of this theorem is relatively cumbersome, we refer the reader to Dalang, Morton and Willinger (1990) for an analysis in discrete time and to the fundamental paper Delbaen and Schachermayer (1994) for definitive results in continuous-time models.
3.1.4. Diffusion models

Now we want to apply this general approach to cases where the stock price process S is given by a diffusion. More precisely, we assume that S is given by the solution to the following SDE
dSt = I~(t, St)St dt + ~r(t, St)St dWt,
So = x ,
(35)
where W is a standard Brownian motion as in Definition 1.1 and /~ and o- are sufficiently smooth such that there is a unique solution to (35); ~r is moreover strictly positive. The model (35) has the following intuitive interpretation: at a given point in time #(t, St) describes the instantaneous growth rate of the asset, while the volatility a(t, St) measures the instantaneous variance of the process log S. Hence o-(t, St) can be interpreted as (local) measure of the risk incurred by investing one unit of the money market account into the stock. In case that a is a constant independent of St the SDE (35) can be solved explicitely; the solution is given by the exponential martingale (13) from Proposition 1.13(b) with fi = 1. In that case the stock price process is referred to as classical Black-Scholes model or as geometric Brownian motion. This model was first proposed by Samuelson (1965), who replaced Bachelier's arithmetic Brownian motion by geometric Brownian motion, the main argument in favour of this change being that real stock prices cannot be negative because of the limited liability of shareholders. Fix some T > 0. To determine an equivalent martingale measure for the stock price process we define
Gr:=exp(-f [~(t,S,)/~(t,S,)]d~-s f
v 1
r [#(t, St)/a(t, St)12dt) ~ .
Under some integrability conditions we have E(GT) = 1. In that case we may define a new probability measure Q on ~ r by putting d Q / d P := GT. According
394
to Girsanov's theorem the process WtQ := Wt + J0[#(s, S,)/cr(s, Ss)]ds is a Brownian motion under Q, see e.g., Section 3.5 of Karatzas and Shreve (1988). Hence S solves under Q the SDE dSt = a(t, St)St dWt Q and is therefore a local Q-martingale and a martingale under some integrability assumptions. As the volatility function o-(t,x) is strictly positive, market completeness follows from the martingale representation theorem for Brownian motion, see e.g., Section 3.4 D of Karatzas and Shreve (1988). This theorem ensures that for any Q-integrable ~ r measurable random variable H the martingale Ht = EQ(HI~t), 0 < t < T, can be represented as a stochastic integral, i.e., there is a predictable process 0 ~ such that /z(, = H 0 +foOffdW~Q. If we now define ~ff := Off/ (a(s, S,)Ss) we immediately get H = H0 + for ~/s dS~. Now there remains of course the task of computing price and hedging strategy. For the purposes of this paper it is enough to consider claims whose payoff has the form H = g(Sr), so-called terminal value claims. For the pricing of pathdependent options in the framework of the classical Black-Scholes model, see for instance Chapter 9 of Musiela and Rutkowski (1997) and the references given therein. For path-independent derivatives the price and the hedge portfolio can be computed by means of a parabolic partial differential equation. Denote by h(t, x) the solution of the terminal value problem
~t h(t,x) +-~az(t,x)x 1 a2 2~x2h(t,x)
= 0,
h(r,x) = g(x) .
(36)
By It6's formula (see e.g., Karatzas and Shreve (1988)) we obtain from (36)
g(Sr) = h(T, ST) = h(t, St) +
~xh(S,S~)dS, .
Hence ~t H = ~h(t, St) and the fair price of the derivative is given by Ht := h(t, St). In the classical Black-Scholes model with constant volatility a the terminal value problem (36) can be solved explicitly for g = ( x - K) +. This yields the famous Black-Scholes formula for the price CBs(t,x, cr) of a European call option.
c s(t,x) = x w ( d ) ) - xw(d?) ,
where
dl
ln(x/K) + (T - t)a2/2 v / ( T _ t ) a2 , d~=d)-
--t)a 2 ,
and where JV denotes the distribution function of the one-dimensional standard normal distribution. Alternatively one could derive the Black-Scholes formula using probabilistic methods to compute the conditional expectation in (34). For an application of this approach in a more general setting see for instance Musiela and Rutkowski (1997) or Frey and Sommer (1996). Of course, up to now we have only been able to present the very basics of modern derivative pricing theory and had to omit many interesting topics. In particular, we have to refer Bj6rk (1997) or Musiela and Rutkowski (1997) for a
395
treatment of models for interest rate derivatives and to Myeni (1992) for a discussion of American-type derivatives. Excellent textbooks on derivative pricing theory with a focus on continuous-time modeling include Bj6rk (1998), Duffle (1992), Lamberton and Lapeyre (1996), Musiela and Rutkowski (1997) and the advanced Karatzas (1997) or Karatzas and Shreve (1998). Moreover, we strongly recommend the excellent essays in Runggaldier (1997). Taleb (1996) finally gives a trader's account of dynamic hedging.
3.1.5. D i s c u s s i o n
Over the last 20 years this approach to pricing and hedging derivative securities has turned out to be very successful from a theoretical and from an applied point of view. One should bear in mind, however, that this elegant theory hinges on several crucial assumptions. Obviously, if our hedging argument is to work for all claims the market must be complete. Moreover, in our definition of the gains from trade we implicitly assumed that there are no market frictions like taxes and transaction costs or constraints on the stockholdings 3. The definition of the gains from trade is reasonable only if our hedger is small relative to the size of the market, meaning that the implementation of his hedging strategy does not affect the price process of the stock. This is of course a very stylized picture of real markets, which is why much of the recent research in Finance has concentrated on relaxing these assumptions. The hedging of derivatives under market frictions has mainly been studied in the framework of the classical Black-Scholes model. Cvitanic (1997) gives an excellent and detailed introduction to the theory of hedging under portfolio constraints. Davis, Panas, and Zariphopoulou (1993), Barles and Soner (1998) or Cvitanic, Pham, and Touzi (1999b) are representative examples of recent work on option pricing with transaction costs. The pricing and hedging of options in markets with a large trader is for instance studied by Jarrow (1994) or Frey and Stremme (1997) and Frey (1998). Many of these papers employ techniques from stochastic control theory and from the theory of nonlinear PDEs. In particular the pricing PDE (36) is often replaced by a nonlinear PDE where the volatility depends on the derivatives of the option price, see e.g., Barles and Soner (1998) or Avellaneda, Levy and Paras (1995). Typically, we enter the realm of incomplete markets whenever we want to use models for asset price dynamics which are more "realistic" than the simple model (35). For instance the simple model from Section 3.1.2 is incomplete if we allow for a third possible value for the stock price at the terminal time T. Perhaps more importantly, markets are incomplete if we consider asset price processes with random volatility or with jumps of varying size. There is, in fact, a lot of statistical support for such models, as most empirical evidence suggests that the classical Black-Scholes model does not describe the statistical properties of financial time series very well. According to this model log-returns, i.e., differences of the form log St+h- logSt, are independent and identically normally distributed. The following Figure 6 shows daily log-returns of the American S&P 500 stock index and
396
05.01.60
05.01.65
05.01.70
05.01,75 Time
05.01.80
05.01.85
05.01.60
05.01.65
05.01.70
05.01.75
05.01.80
05.01.85
Time Fig. 6. Daily log-returns of the S&P 500 index (top picture) and simulated normal variates with mean and variance equal to the sample mean and sample variance of the S&P 500 log-returns.
simulated i.i.d, normal variates with variance equal to the sample variance of the S&P 500 log-returns. This picture makes two stylized facts immediately apparent, which are typical for most financial time series. We see that large asset prize movements occur more frequently than in a model with normally distributed increments. This feature is often referred to as excess curtosis or fat tails; it is the main reason for considering asset price processes with jumps. There is evidence for volatility clusters, i.e., there seems to be a succession of periods with high return variance and with low return variance. This observation motivates the introduction of diffusion models for asset prices where volatility is itself stochastic. Of course, these findings have been confirmed by m a n y rigorous statistical tests, see e.g., Pagan (1996) for an extensive survey. In the remainder of this paper we will discuss some recent work on derivative asset analysis in models with jumps and/or stochastic volatility; this will allow us also to make contact with some approaches to derivative pricing in incomplete markets.
3.2. Some new models for asset prices 3.2.1. Stochastic volatility models Most of the diffusion models that have been proposed in recent years as an extension to the classical Black-Scholes model belong to the class of stochastic volatility models (SV-models). In this class of models the volatility is modeled as a stochastic process whose innovations are only imperfectly correlated to the asset price process. Our definition of a SV-model is as follows.
397
ASSUMPTION 3.1. S follows a general stochastic volatility model, if it solves the SDE
dSt = St(at dWt + #t dt)
(37)
t 2 for predictable processes at and #t- We assume that at > 0, f0 a~ ds < oc and that o't is not adapted to the filtration generated by W.
In economic terms this last assumption simply means that besides W there is a second source of randomness, influencing the system. In most papers from the financial literature it is assumed that the instantaneous variance vt = 0.2 follows a one-dimensional diffusion: ASSUMPTION 3.2. S and v satisfy the SDE
dSt = St(v]/2dWt (1) + #(vt)dt
(38) (39)
dvt = a(vt)dt +tll (vt)dWt (1) + t/2(vt)dWtt(2),
for Wtt= (Wtt (1), Wt(2)) a standard two-dimensional Wiener process. We assume that the coefficients are such that the vector SDE (38), (39) has a non-exploding and strictly positive solution. Moreover, there is some 0 _< a 0 for all v E (a, b). The above class of volatility models contains among others the SV-models considered by Wiggins (1987), Hull and White (1987) or Heston (1993) as special cases. The function/11 models the instantaneous correlation of logS and v. Most empirical studies have found that at least on equity markets ~11 is significantly negative, an observation which is termed the leverage effect since Black (1976). SV-models can be obtained as diffusion limits of certain popular G A R C H models. This has potentially important implications for parameter estimation and for derivative asset analysis in these models. For a detailed analysis of " A R C H models as diffusion approximation" and related topics see Nelson (1990), Duan (1997) or the surveys Frey (1997) and Ghysels, Harvey, and Renault (1996). Duffle and Protter (1992) give an in-depth discussion of results on weak convergence of asset price processes and implications in Finance. SV models are typically incomplete, meaning that there are derivatives which cannot be replicated by dynamic hedging. As explained in Section 3.1 this is equivalent to the fact that there are now many probability measures Q ~ P such that the stock price process is a (local) Q-martingale. The next proposition characterizes the set of all equivalent local martingale measures for the stock price process defined in Assumption 3.2. For similar results and a proof see e.g., Hofmann, Platen and Schweizer (1992) and the references given therein. PROPOSITION 3.3. a) Under Assumption 3.2 a probability measure Q equivalent to P on f i r is a local martingale measure for S on f i r if and only if there is a
398
progressively measurable process v = (vt)0_<t_<r with for v2 ds < ec P - a.s. such that the following holds: The local martingale (Gt)0___t<_rwith G, := exp (-#(v~)/v/~)dW~ (~) + v, dW~(2) (40)
1 / o * ( # ( v s ) / x / ~ ) 2 + v2 ds )
satisfies E(Gr) = 1 and Gr = dQ/dP on f i r . b) Suppose that Q is an equivalent local martingale measure corresponding to some process v. Then S and v solve the following SDE under Q
dSt = StlvtlI/2dW}
1) ,
(41)
dvt = a(vt) - th (Vt)l~(Vt) /v/-~t + tl2(vt)vt dt

+ th (vt)d~ (I) + qz(vt)dWl2) , where W is a two-dimensional standard Brownian motion under Q. (42) []
In the financial literature the process v is usually referred to as market price of volatility risk process. Proposition 3.3 shows that there is a one to one correspondance between market price of volatility risk processes v satisfying some regularity conditions and equivalent (local) martingale measures. In particular market incompleteness is equivalent to non-uniqueness of the market price of risk process.
3.2.2. Models with discontinuous price paths

Real markets exhibit from time to time very large price movements over short time periods. Even if we allow for stochastic volatility these price movements are only very difficult to reconcile with the assumption that asset prices follow diffusion models with continuous trajectories. Moreover, in an interesting empirical study Bakshi, Cao and Chen (t997) have shown that in order to explain observed option prices one should allow for both, stochastic volatility and the possibility of occasional jumps. A rather general jump-diffusion model has been proposed by Colwell and Elliot (1993). They assume the following dynamics for the stock price S:
dSt = #(t, St_)dt + ~(t, St_)dWt + . y(t, St ,y)(#(dt, dy) - H(dy)dt)
(43) Here W is standard Brownian motion and # is a random measure with deterministic compensator v = H(dy)dt, which is assumed to be independent of W. We can alternatively write model (43) as follows.
St = So +
/0
#(s, Ss_)ds +
/0
~(s, Ss_)dW~
(44)
399
+ ~ 7(% &i-, Y~) i=1
7(s, Ss-,y)H(dy)ds) .
(45)
As the compensator v is deterministic, Zt = ~N='1 Y/is a compound Poisson process with intensity 2 = f~ 1H(dy). The stopping times ~i denote the successive jump-times of N. The distribution of the Y,-is given by H = 2 -1 dH, This notation makes the similarities to the models studied in Chapter 2 apparent. It follows from general results on SDEs driven by random measures that S is a Markov process. Most jump-diffusion models from the financial literature are special cases of (44). If we take t~(t,x)= #(t)x,a(t,x)= a(t)x and 7(t,x,y)= 7(t)x for deterministic functions/~, a and 7 with 7(t) > - 1 for all t we obtain the models of Merton (1976) or Mercurio and Runggaldier (1993). Bakshi, Cao and Cheng (1997) consider a model where 7(t, x, y) = xy, and where 1 + Y is lognormally distributed; they allow, moreover, for stochastic volatility. In all these models the assumption that 7(t,x,y) > - 1 a.s. is made to ensure that the asset price process is strictly positive. Jump-diffusion models of the form (43) are typically incomplete as there are many different equivalent martingale measures. Intuitively speaking this is due to the fact that by an equivalent change of measure we may change the drift, the jump-size distribution and the jump-intensity of the process; there are typically many different combinations of these parameters and hence many different equivalent probability measures that turn S into a (local) martingale. Colwell and Elliot (1993) determine the class of equivalent martingale measures for the model (43) that preserve the Markov property. Eberlein and Keller (1995) introduce another class of discontinuous stochastic processes for asset prices. Their analysis is motivated by statistical considerations which show that the hyperbolic distribution (see e.g., Barndorff-Nielsen and Halgreen (1977)) yields an excellent fit to the distribution of log-returns for various stocks. The hyperbolic distribution is infinitely divisible and generates therefore a L6vy process, the so-called hyperbolic L6vy motion. The L6vyKhinthine representation of this process shows that the hyperbolic L & y motion is a quadratic pure jump process, i.e., orthogonal (in the sense of quadratic variation) to all continuous semimartingales. We refer the interested reader to Eberlein and Keller (1995) for further information.
3.3. Pricing and hedging of derivatives in incomplete markets

As we have just seen, if we move on from the classical Black-Scholes models to more realistic models with jumps and stochastic volatility, we usually end up with an incomplete market where perfect hedging strategies for derivatives do not exist. Hence, a conceptual problem arises: how should we value contingent claims, and how should we manage the risk we incur by selling the claim? Of course, there is now no longer a unique answer to these questions. However, in recent years a number of interesting concepts for the risk-managements of derivatives in
400
incomplete markets have been developed, and we are now going to survey two such approaches. 3.3.1. Superreplication If the precise duplication of a contingent claim is not feasible one might try to find a superreplicating strategy, i.e., the "cheapest" self-financing strategy with terminal value no smaller than the payoff of the contingent claim. This concept has been developed first by E1 Karoui and Quenez (1995). To explain their results we have to give some definitions first. DEFINITION 3.4. Consider a contingent claim H with nonnegative payoff. An adapted, nonnegative cfidl/tg process H with ~rr = H is called an admissible price for sellers, if ~r is the value process of some trading strategy with nonincreasing cost process C. An admissible price process for sellers H* will be called the ask price for H, if HI _< ~r, for any other admissible price for sellers H and for all
t c {0, T].
This definition deserves a comment. Suppose that an investor sells at time t < T the claim H at an admissible selling price Lrt. By following the corresponding portfolio strategy he can then completely eliminate the risk incurred by selling the claim and moreover he earns the nonnegative amount - ( C r - Ct). Hence, he will certainly agree to sell the claim for the price ~r, The following is an example for an admissible price process for sellers in the case of a European call option. Define ~It=St, {t=l for0<t<T and ~ r r = ( S r - K ) +, ~ - r = 0 . (46)
The cost process is then given by C, = 0 for t < T and Cr = (St - K) + - St. It is a priori not clear that an ask-price for a contingent claim exists. Here we have the following result, which was proved in increasing generality by Delbaen (1992), E1 Karoui and Quenez (1995), Kramkov (1996), and F611mer and Kabanov (1998). THEOREM 3.5. Assume that the set ~ of equivalent local martingale measures for the asset price process S is nonempty. Then the ask price exists for every contingent claim H with nonnegative payoff; it is given by H[ = ess sup E Q ( H l ~ t ) . QE.~ (47) [] It is easily seen that the ask price cannot be smaller than H*. In fact, we have for every admissible price process for sellers ~r H -- ~r r = ~r t +
f T~s dS~ -
(Cr - Ct) < ~rt +
~s dSs
(48)
401
Fix some Q E 2. The stochastic integral J~ is dS~ is a nonnegative local martingale and hence a supermartingale. Taking expectations on both sides of (48) we get
EO(HI~t) <_~[t + E Q
G dSsl,~t
<_H, .
Hence, we must have hrt _> ess sup{EQ(IYIIo~t), Q c ~}. The difficult part in the proof of Theorem 3.5 is to show that the process H* can be represented as the sum of a stochastic integral w.r.t. S and an adapted non-increasing process. At a first glance superreplication seems to be a very attractive concept for the pricing and the hedging of derivatives in incomplete markets. Unfortunately, in applications it often leads to results which are not very satisfactory. Consider for instance the SV-model which was introduced in Assumption 3.2, and assume that as in most models from the financial literature - ttz(V) > 0 for all v > 0. By well-known results on one-dimensional diffusions this implies that the range of vt is unbounded. For this class of models Frey and Sin (1999) have shown that under some minor technical conditions we have
esssupEQ((Sr-K)+l~t)=St Q~
for all t < T , K > 0
see also Cvitanic, Pham, and Touzi (1999a) for related results. In light of Theorem 3.5 we can therefore conclude that the ask price process and the corresponding hedge portfolio are given by (46); in other words the cheapest superreplicating strategy for a call option is to buy the stock. Similar results have been obtained for the other new model classes introduced in Section 3.2; see Bellamy and Jeanblanc (1997) for an analysis of superhedging in jump-diffusion models and Eberlein and Jacod (1997) for results in the context of discontinuous L~vy processes. In spite of these disappointing results there are good financial reasons to study superhedging strategies. For instance, these strategies appear as building blocks in the quantile hedging approach of F611mer and Leukert (1998). These authors relax the condition that the terminal value of the hedging strategy should almost surely be no smaller than the payoff of the claim under consideration; instead they focus on the cheapest hedging strategy with nonnegative value process which superreplicates the claim with a given success probability. We refer the reader to their paper for further details. There are other situations where the superhedging approach yields very interesting and relevant results. Several authors have applied the concept of superhedging to the problem of hedging a derivative in the Black-Scholes model but with certain constraints on the hedging portfolio, see for instance Cvitanic (1997). Many papers address the problem of superhedging in stochastic volatility models with known a priori bounds on the volatility. These bounds are usually interpreted as confidence interval for the range of future volatility. In this situation the ask-price of a call-option is given by the Blac~Scholes price of the option corresponding to the upper volatility bound. For details on this work see the papers
402
by E1 Karoui, Jeanblanc and Shreve (1998), Avellaneda, Levy and Paras (1995), Lyons (1995) or Frey (2000).
3.3.2. Mean-variance hedging In the theory of mean-variance hedging which subsumes the so-called (local) riskminimization and variance-minimization approaches one wants to find a trading strategy that reduces the actual risk of a derivative position to some "intrinsic component." While the computation of the strategy usually involves the computation of "prices" for contingent claims, the emphasis of this theory is not on the valuation of derivatives but on the reduction of risk. We now explain these approaches in more detail. We restrict ourselves to trading strategies with square-integrable cost- and value processes. In the theory of (local) risk-minimization the conditional variance of C under the "real-world" probability measure P is used as a measure for the risk of a strategy. For a given claim H one tries to determine a strategy (~R r/R) with terminal value equal to H that minimizes at each time t the remaining risk Rt := EP((Cr - Ct)2[~t) .
(49)
Here the minimization is over all admissible continuations of (4*, ~/*) after t with terminal value equal to H. F611mer and Sondermann (1986) have studied existence and uniqueness of such a strategy if the stock price process is a P-martingale. In that case existence and uniqueness of such a strategy follows from the well-known Kunita-Watanabe decomposition of the P-martingale Ht = EP ( H [ Y t ) with respect to the P-martingale S. This decomposition result implies that the martingale Ht can be decomposed as
H~ -- Ho +
f0 t ~., ~HdSs + L ~ ,
(50)
where LH is a martingale orthogonal to S, i.e., the product SL H is again a martingale. A proof of this result can be found in all major textbooks on stochastic analysis. The risk-minimizing strategy (~R, qR) is then given by ~R := ~ , qR :_ Ht - ~tSt, and hence Ct = L H . Note that the risk-minimizing strategy is no longer self-financing as the cost process does not necessarily vanish; however, the strategy is mean self-financing, i.e., the cost process is a P-martingale with E(Cr) = 0. In the variance-minimization approach one seeks to determine a self-financing strategy (i v, rl v) which minimizes the LZ-norm of the hedging error, i.e., the expression
403
IfS is a P-martingale a unique solution solution exists; it can again be described in terms of the Kunita-Watanabe decomposition (50). We now put ~v := ~/t, V0 := H0 and t/v := Ho + J~ ~VdSs - ~tSt, which is typically not equal to t/R. Let us now turn to the general situation where S is only a semimartingale under P. Here the risk-minimization approach and the variance-minimization approach lead to different solutions also for the stockholdings ~ of the optimal strategy. As shown by Schweizer (1991) for semimartingales a globally risk-minimizing strategy does not always exist. He therefore introduces a criterion of local riskminimization. Roughly speaking a strategy (~R, tlR) is locally risk-minimizing if it minimizes the remaining risk over all strategies that "deviate" from (~R, t/R) only over a sufficiently short time period. Schweizer (1991) shows that under some technical conditions a strategy is locally risk-minimizing if and only if the associated cost process is a martingale orthogonal to the martingale part of S. To compute such a strategy we have to find a decomposition of our claim H of the following form H = H0 +
f0
~ d S s + Lr H ,
(51)
where LH is a P-martingale orthogonal to the martingale part of S under P. The local risk-minimizing strategy is then defined via ~R :---- ~t/ and CR := LH. In particular, the strategy is still mean-selffinancing. In case that S is a P-martingale the decomposition (51) reduces to the Kunita-Watanabe decomposition of the P-martingale H with respect to S. If S is only a semimartingale the decomposition (51) is usually referred to as F6llmer-Schweizer decomposition. The main tool for the computation of the F611me~Schweizer decomposition is the minimal martingale measure Q* introduced in F611mer and Schweizer (1991). In particular, F611mer and Schweizer show that for continuous asset price processes the decomposition (51) is uniquely determined. It exists under some integrability assumptions and is then given by the Kunita-Watanabe decomposition of the Q* martingale Ht = EQ* (I-Ilyt) with respect to the Q*-martingale S. Using this approach locally risk-minimizing strategies in various kinds of stochastic volatility models have been computed; see e.g., F611mer and Schweizer (1991), Hofmann, Platen and Schweizer (1992), Di Masi, Kabanov and Runggaldier (1994) or Frey (1997). Colwell and Elliott (1993) apply the concept of local risk-minimization to the jump-diffusion model introduced in Section 3.2.2. The key point in ensuring existence of a variance-minimizing hedging strategy is the closedness in L2(P) of the following set of random variables
G :-- { f r dSs,
~ "admissible"} .
If this set is closed, a variance-minimizing strategy for a contingent claim H can at least theoretically - be computed as orthogonal projection of H onto G. Unfortunately, the analysis of the closedness of G is rather technical and we refer the reader to Delbaen et al. (1997) for details on this issue. For continuous processes
404
some easier proofs and more concrete examples are given in Pham, Rheinl/inder and Schweizer (1998).
4. On the interplay between finance and insurance Historically, the fields of finance and insurance have developed separately, unified mainly by the common use of the theory of stochastic processes as a principal tool of analysis. However, caused by developments in the financial sector such as the increasing collaboration between insurance companies and banks (all-finance) or the emergence of finance-related insurance products, the interplay between finance and insurance has recently become a "hot topic," and we believe that a lot of important future research in finance and insurance will combine ideas from both fields. It seems therefore a good idea to conclude this survey with a brief discussion of some recent developments in this area. For a related discussion see also the paper Embrechts (2000).
4.1. Methodological differences
To prepare the ground for our discussion we now summarize the preceding chapters and point out the main differences between the classical actuarial and financial approaches to dealing with financial risk as presented in the preceding parts of the paper. In modern derivative asset analysis one aims at "hedging away" financial risks by dynamic trading. Prices are determined by the funds needed to finance this hedge. Consequently, the distribution under the real world probability measure of some financial risk (e.g., the payoff of a derivative) is not used for pricing this risk; instead prices are computed using some "artificial" martingale measure whose existence is intimately related to the economic notion of no-arbitrage. The standard actuarial approach to dealing with financial risks is fundamentally different. Insurance companies are ready to bear some of the financial risks (claims) of an insured in exchange for a premium that equals the expected value of the claim plus some risk premium or loading. This loading is computed via actuarial premium principles; see e.g., Goovaerts, De Vylder, and Haezendonck (1984) for a detailed discussion. While the insurance company might pass on a part of this risk to a reinsurer, it can typically not "hedge away" the risks in its portfolio by dynamic trading. Consequently, the computation of insurance premiums, ruin probabilities or necessary reserves is done using the real distribution of the claims; martingales enter the analysis only as an - albeit very important - technical tool. The difference between the actuarial and the financial approach to financial risk management is also highlighted by the following quote from Jensen and Nielsen (1996). "Theories and models dealing with price formation in financial markets are divided into (at least) two markedly different types. One type of model is attempting
405
to explain levels of asset prices, risk premiums etc. in an absolute manner in terms of the so-called fundamentals. A crucial model of this type includes the well-known rational expectation model equating stock prices to the discounted value of expected future dividends. Another type of model has a more modest scope, namely, to explain in a relative manner some asset prices in terms of other, given and observable prices." It is clear from the preceding discussion that derivative pricing theory adheres to the latter approach, whereas actuarial models come closer to an absolute pricing theory. A second difference between the standard models in the two fields concerns the class of stochastic processes used. Insurance risk-processes like the C r a m 6 r Lundberg model have discontinuous sample paths which are of finite variation; whereas most standard finance models use diffusion processes with continuous trajectories to describe asset price fluctuations. However, we have seen in Section 3.2.2 that certain "new" models for asset prices resemble closely actuarial risk processes. In summary, from a methodological viewpoint the two fields seem to be relatively far apart. However, if we look at recent developments, it is very likely that in the future the gap between both disciplines will become much smaller than it appears to be now.
4.2. Financial pricing o f insurance
The fundamental papers on this topic are due to Sondermann (1991) and in particular to Delbaen and Haezendonck (1989). We now explain the "martingale approach to premium calculation in an arbitrage-free-market" proposed in the latter paper. Delbaen and Haezendonck start from the underlying risk process Xt that represents the total claim amount of a fixed portfolio of insurance contracts that has been paid out up to time t. Xt is modeled as a compound Poisson process as in Section 1.1, i.e., we have Xt = ~--'1 Y, for i.i.d, random variables (Y/)i~, and N is a homogeneous Poisson process independent of the Y~. Delbaen and Haezendonck assume that at every point in time t the insurance company can sell the remaining claim payments Xr - Xt of this portfolio over the period (t, T] for some premium p , Necessarily such a premium must be a predictable process. Hence the underlying price process St (the value of the portfolio of claims at time t) has the form
St = pt + Xt .
N o w comes the crucial point that marks the departure from usual insurance pricing principles. Delbaen and Haezendonck argue that "The possibility of buying and selling at time t represents the possibility of "takeover" of this policy. This liquidity of the market should imply that there are no arbitrage opportunities and hence by the Harriso~Kreps theory (Harrison and
406
Kreps, 1979) there should be a risk neutral probability distribution Q such that {S~ : 0 < t < T} is a Q-martingale." The next step in the pricing of insurance contracts by no-arbitrage arguments is the selection of an appropriate measure Q. Delbaen and Haezendonck are interested in all those measures Q that lead to linear premiums of the form Pt ~ pQ(T - t) for the underlying risk-process X itself and for all excess-of-loss reinsurance contracts with payoff Nr
= K/+
i=l
The number pO - which depends of course on the particular excess-of-loss contract under consideration - is then called a premium density. It can be shown that this implies that under Q the process S must again be a compound Poisson process, possibly with different loss-distribution #Q and loss-intensity )~Q. A premium density pQ then takes on the form
pQ = EQ(s1) = EQ(N1)EQ(Y) = 2 Q
/0
y#Q(dy) .
Delbaen and Haezendonck show that we may obtain any claim-size distribution #Q which is equivalent to the original claim size distribution # and every intensity 2 Q > 0 in this way. In particular, they show how certain well-known premium principles can be obtained by an appropriate choice of 2 Q and #Q. Here a word of warning is in order: while we may justify a particular premium principle for X by choosing Q appropriately (say Q = Q*), our no-arbitrage pricing approach will not necessarily yield the same premium principle simultaneously for all insurance derivatives like our excess of loss contracts Cx: the expected value
E Q*
(Y/- K) +
= 2 Q*
(y - K)+# Q*(dy)
need not correspond to the same premium principle. On the other hand there are typically several measures leading to the same premium density for X. A critical statement concerning actuarial premium principles is to be found in Venter (1991). Delbaen and Haezendonck derive their results directly; see also Embrechts and Meister (1997). Alternatively, one might use Girsanov-type theorems on equivalent change-of-measure for marked point processes as presented among others in Br~maud (1981).
4.3. Insurance derivatives
An area closely related to the pricing of insurance contracts by no-arbitrage arguments is the valuation of insurance derivatives. The payoff of such derivatives is (partially) linked to the losses of some predetermined insurance portfolio or to
407
some standardized loss index. Examples include the PCS-options traded on the Chicago Board of Trade or certain so-called CAT-bonds (catastrophe bonds) issued by individual (re-)insurance companies. Insurance companies use these instruments in order to pass on some of their risk to the capital markets; for certain investors on the other hand, these derivatives might be interesting tools to further diversify their investment risks. For more institutional details about these derivatives see e.g., Canter, Cole, and Sandor (1996). A stylized mathematical description of an insurance derivative could be as follows. Let X be a risk-process of the form Xt = ~N_' 1 Y, representing the underlying loss index. Then the payoff of a typical insurance derivative is given by some function F(Xr); for instance we have in the case of a PCS-option F(Xr) = (Xr - K1) + - (Xr - K2) + for some 0 < K1 < K2
To explain the main problem arising in the pricing of such contracts let us assume as in Section 4.2 that X is a compound Poisson process, and that at every point in time t the remaining risk Xr - X t can be bought or sold for the price p* (T - t). Arbitrage pricing theory now only tells us that, after discounting, every viable price process for our derivative must be of the form
14t = E Q ( F ( X ~ ) I g t ) ,
where Q ~ P and E Q ( x r l Y t ) = Xt + p * ( T - t) for all t. As soon as the claim sizes Y are variable - certainly the relevant case if we are talking about insurance against catastrophic events there are many measures with this property, even if we stick to the assumption that X is compound Poisson under Q. In fact, under some technical conditions every new intensity 20 > 0 and every claim-size distribution #Qequivalent to the distribution # of the Yi under P would be in order, provided that
,~O y#O(dy) = p* .
(52)
Equation (52) leaves plenty of choice as soon as the support of # has at least two elements. Hence, the pricing of insurance derivatives leads to a pricing problem in incomplete markets, and one might apply one of the concepts introduced in Section 3.3; we think that the risk-minimization approach is particularly well suited here. We refer the reader to Embrechts and Meister (1997) for a detailed discussion of the methodological questions related to the pricing of insurance derivatives and for a more complete list of the relevant literature. Schmock (1998) contains an interesting discussion of some statistical issues arising in the area.
4.4. Actuarial methods in finance
So far we have dealt mainly with the application of financial pricing techniques to insurance problems. However, actuarial concepts are also of increasing relevance
408
for finance problems. We have seen that realistic models for asset price processes are typically incomplete. In addition, the results mentioned in Section 3.3,1 have shown that in many incomplete market models the concept of superhedging does not lead to satisfactory answers for the risk-management of derivatives. Consequently, interesting approaches to this problem must involve some sort of risksharing between buyer and seller; in particular the seller has to bear a part of the "remaining risk." Moreover, participants in derivative markets are faced with a large amount of credit risk, and it would be illusory to believe that all this risk can be hedged away. We refer the reader to the survey Lando (1997) for more information on financial models for credit risky securities. Actuarial concepts for risk-management might prove helpful in dealing with these "unhedgeable" risks. To mention an example where such concepts are already applied, the RAC-(risk adjusted capital) approach in insurance has become popular among investment banks as a tool for the determination of risk capital and capital allocations. It is no coincidence that Swiss Bank Cooperation (now UBS) called one of its credit risk management systems A C R A which stands for Actuarial Credit Risk Accounting.
Acknowledgements
The second author would like to thank UBS for the financial support. The authors also take pleasure in thanking the referee for the very careful reading of the first version of the manuscript.
References
Ammeter, H. (1948). A generalization of the collective theory of risk in regard to fluctuating basic probabilities. Skand. Aktuar Tidskr. 171 198. Andersen, E. (1957). On the collective theory of risk in the case of contagion between the claims. Transactions XVth International Congress of Actuaries, H New York. pp. 219 222. Avellaneda, M., A. Levyand A. Paras (1995). Pricing and hedgingderivative securities in markets with uncertain volatilities. Appl. Math. Finance 2, 73-88. Bachelier, L. (1900). Th6orie de la speculation. In The Random Character of Stock Market Prices (Ed., P. Cootner), pp. 17 78. MIT Press, Cambridge, Mass. (1964). Bakshi, G., C. Cao and Z. Chert (1997). Empirical performance of alternative option pricing models. J. Fin. 52, 2003-2049. Barles, G. and M. Soner (1998). Option pricing with transaction costs and a nonlinear Black-Scholes equation. Fin. & Stoch. 2, 369 397. Barndorff-Nielsen, O. and O. Halgreen (1977). Infinite divisibility of the hyperbolic and generalized inverse Gaussian distribution. Zeitsehrift fffr Wahrscheinlichkeitstheorie 38, 309-312. Bellamy, N. and M. Jeanblanc (1997). Incompleteness of Markets Driven by a Mixed Diffusion. Preprint, Universit6 d'Evry, Paris, France. Bertoin, J. (1996). Ldvy Processes. Cambridge University Press, Cambridge. Bj6rk, T. (1997). Interest Rate Theory. In Financial Mathematics, Springer Lecture Notes in Mathematics 1656, pp. 53-122. Bj6rk, T. (1998). Abitrage Theory in Continuous Time. Oxford U.P., Oxford.
409
Bj6rk, T. and J. Grandell (1988). Exponential inequalities for ruin probabilities in the Cox case, Scand. Act. J. 77-111. Black, F. (1976). Studies in Stock Price Volatility Change. In Proceedings of the 1976 Business Meeting of the Business and Economics Statistics Selection, American Statistical Association, pp. 17%181. Black, F. and M. Scholes (1973). The Pricing of Options and Corporate Liabilities. J. Polit. Econ. 81(3), 637-654. Borodin, A. and P. Salminen (1996). Handbook of Brownian Motion Facts and Formulae. Birkhfiuser, Basel. Br6mand, P. (1981). Point Processes and Queues: Martingale Dynamics. Springer, New York. Biihlmann, H. (1970). Mathematical Methods in Risk Theory. Springer, Berlin. Canter, M., J. Cole and R. Sandor (1996). Insurance Derivatives: A New Asset Class for the Capital Markets and a New Hedging Tool for the Insurance Industry, J. Deriv. Winter 1996, 89-104. Clark, P. (1973). A subordinated stochastic process model with finite variance for speculative prices. Econometrica 41, 135-156. Colwell, D. and R. Elliott (1993). Discontinuous asset prices and non-attainable contingent claims. Math. Fin. 3, 295-308. Cox, J., S. Ross and M. Rubinstein (1979). Option pricing: A simplified approach. J. Fin. Econ. 7, 229363. Cvitanic, J. (1997). Optimal Trading under Constraints. In f'inancial Mathematics, Springer Lecture Notes in Mathematics 1656, pp. 123-190. Cvitanic, J., H. Pham and N. Touzi (1999a). Superreplication in stochastic volatility models under portfolio constraints. J. Appl. Probab. 36, 523-545. Cvitanic, J., H. Pham and N. Touzi (1999b). A closed-form solution to the problem of superreplication under transaction costs. Fin. & Stoch. 3, 35-54. Dalang, R., A. Morton and W. Willinger (1990). Equivalent martingale measures and no-arbitrage in stochastic security market models. Stoch. Stoch. Rep. 29, 185-201. Dassios, A. and P. Embrechts (1989). Martingales and insurance risk. Comm. Statist. Stoch. Models 5, 181-217. Davis, M. H. A. (1984). Piecewise-deterministic Markov processes: A general class of non-diffusion stochastic models. J. Roy. Statist. Soc. B 46, 353 388. Davis, M. H. A. (1993). Markov Models and Optimization. Chapman & Hall, London. Davis, M. H. A., V. Panas and T. Zariphopoulou (1993). European Option Pricing with Transaction Costs. SIAM J. Opt. Cont. 31, 470-493. Delbaen, F. (1992). Representing martingale measures when asset prices are continuous and bounded. Math. Fin. 2, 107 130. Delbaen, F. and J. Haezendonck (1987). Classical risk theory in an economic environment. Insurance: Mathematics and Economics, 6, 85-116. Delbaen, F. and J. Itaezendonck (1989). A martingale approach to premium calculation principles in an arbitrage-free market. Insurance." Mathematics and Economics, 8, 269-277. Delbaen, F., P. Monat, W. Schachermayer, M. Schweizer and C. Stricker (1997). Weighted norm inequalities and hedging in incomplete markets. Fin. & Stoch. 1, 181328. Delbaen, F. and W. Schachermayer (1994). A general version of the fundamental theorem of asset pricing. Math. Ann. 300, 463-520. Di Masi, G., Y. Kabanov and W. Runggaldier (1994). Mean-variance hedging of options on stocks with stochastic volatilities. Theory Prob. Appl. 39, 17~182. Doob, J. (1953). Stochastic Processes. Wiley, New York. Duan, J. (1997). Augmented GARCH (p,q) process and its diffusion limit. J. Econ. 79, 97-127. Duffle, D. (1992). Dynamic Asset Pricing Theory. Princeton University Press, Princeton, New Jersey. Duffle, D. and P. Protter (1992). From discrete to continuous time finance: Weak convergence of the financial gain process. Math. Fin. 2, 1-15. Eberlein, E. and J. Jacod (1997). On the range of option prices. Fin. & Stoch. 1(2), 131 140. Eberlein, E. and U. Keller (1995). Hyperbolic distributions in finance. Bernoulli 1, 281399.
410
E1 Karoui, N., M. Jeanblanc-Picqu~ and S. Shreve (1998). Robustness of the Black and Scholes Formula. Math. Fin. 8, 93-126. E1 Karoui, N. and M.-C. Quenez (1995). Dynamic programming and pricing of contingent claims in an incomplete market. SIAM Journal on Control and Optimization 33(1), 2~66. Embrechts, P. (1983). A property of the generalized inverse Gaussian distribution with some applications. J. Appl. Prob. 20, 537-544. Embrechts, P. (2000). Actuarial versus financial pricing of insurance. Risk Finance 1(4), 17-26. Embrechts, P., C. Goldie and N. Veraverbeke (1979). Subexponentiality and infinite divisibility. Zeitschrift f~'r Wahrscheinlichkeitstheorie und verwandte Gebiete 49, 335-347. Embrechts, P., J. Grandell and H. Schmidli (1993). Finite-time Lundberg inequalities in the Cox case. Scand. Act. J. pp. 17~41. Embrechts, P., C. Klfippelberg and T. Mikosch (1997). Modelling Extremal Events for Insurance and Finance. Springer, Berlin. Embrechts, P. and S. Meister (1997). Pricing insurance derivatives, the case of CAT-futures. In Proceedings of the 1995 Bowles Symposium on Securitization of Insurance Risk, Gorgia State University Atlanta (Ed., S. Cox), pp. 15-26. Society of Actuaries, Monograph M-FI97-1.
Embrechts, P. and N. Veraverbeke (1982). Estimates for the probability of ruin with special emphasis on the possibility of large claims. Insurance: Mathematics and Economics, 1, 55-72. Feller, W. (1971). An Introduction to Probability Theory and Its Applications." Volume H. Wiley, New York. F6ilmer, H. (1991). Probabilistic aspects of options. SFB 303 Discussion Paper B-202, University of Bonn. F611mer, H. and Y. Kabanov (1998). Optional decomposition and lagrange multipliers. Fin. & Stoch. 2, 69-81. F611mer, H. and P. Leukert (1998). Quantile hedging, preprint, Humboldt-Universitfit Berlin, to appear in Finance and Stochastics 3 (3) (1999). F611mer, H. and M. Schweizer (1991). Hedging of contingent claims under incomplete information. In Applied Stochastic Analysis (Eds., M. H. A. Davis and R. J. Elliot), pp. 389~414. Gordon and Breach, London. F611mer, H. and M. Schweizer (1993). A microeconomic approach to diffusion models for stock prices. Math. Fin. 3(1), 1-23. F611mer, H. and D. Sondermann (1986). Hedging of non-redundant Contingent-Claims. In Contributions to Mathematical Economics (Eds., W. Hildenbrand and A. Mas-Colell), pp. 147-160. North Holland. Frey, R. (1997). Derivative asset analysis in models with level-dependent and stochastic volatility. CWI Quarterly, Amsterdam, 10, 1-34. Frey, R. (1998). Perfect option replication for a large trader. Fin. & Stoch. 2, 115 148. Frey, R. (2000). Superreplication in stochastic volatility models and optimal stopping. Fin. & Stoch. 4, 161 188. Frey, R. and C. Sin (1999). Bounds on european option prices under stochastic volatility. Math. Fin. 9, 97-116. Frey, R. and D. Sommer (1996). A Systematic approach to pricing and hedging of international derivatives with interest rate risk. Appl. Math. Fin. 3, 295-317. Frey, R. and A. Stremme (1997). Market volatility and feedback effects from dynamic hedging. Math. Fin. 7(4), 351 374. Furrer, H. (1998). Risk processes perturbed by e-stable L~vy motion. Stand. Act. J. pp. 59-74. Furrer, H., Z. Michna and A. Werou (1997). Stable L6vy motion approximation in collective risk theory. Insurance: Mathematics and Economics 20, 23-36. Furrer, H. and H. Schmidli (1994). Exponential inequalities for ruin probabilities of risk processes perturbed by diffusion. Insurance: Mathematics and Economics 15, 23-36. Geman, H. and T. An6 (1996). Stochastic Subordination, RISK, 9(9), 146-149. Gerber, H. (1979). An Introduction to Mathematical Risk Theory. Huebner Foundation Monographs 8, distributed by Richard D. Irwin Inc., Homewood Illinois.
411
Ghysels, E., A. Harvey and E. Renault (1996). Stochastic Volatility. In Handbook of Statistics (Eds., G. Maddala and C. Rao) vol. 14, Statistical Methods in Finance, pp. 119-191. North Holland. Goovaerts, M., F. De Vylder and J. Haezendonck (1984). Insurance Premiums. North Holland, Amsterdam. Grandell, J. (1976). Doubly Stochastic Poisson Processes. Lecture Notes in Mathematics 529. Springer, Berlin. Grandell, J. (1991). Aspects of Risk Theory. Springer, Berlin. Grandell, J. (1997). Mixed Poisson Processes. Chapman and Hall, London. Guillaume, D., M. Dacorogna, R. Day6, U. Mfiller, R. Olsen and P. Pictet (1997). From the bird's eye to the microscope: A survey of new stylized facts of the intra-daily foreign exchange markets. Fin. & Stoch. 1, 95 129. Harrison, J. (1985). Brownian Motion and Stochastic Flow Systems. Wiley, New York. Harrison, J. and D. Kreps (1979). Martingales and arbitrage in multiperiod securities markets. J. Econ. Theory 20, 381408. Harrison, J. and S. Pliska (1981). Martingales and stochastic integrals in the theory of continuous trading. Stoch. Proc. Appl. 11, 215-260. Harrison, J. and S. Pliska (1983). A stochastic calculus model of continuous trading: Complete markets. Stoch. Proc. Appl. 15, 313-316. Heston, S. (1993). A closed form solution for options with stochastic volatility with applications to bond and currency options. Rev. Fin. Stud. 6, 327-343. Hofmann, N., E. Platen and M. Schweizer (1992). Option pricing under incompleteness and stochastic volatility. Math. Fin. 2, 153-187. Hull, D. and A. White (1987). The pricing of options on assets with stochastic volatilities. J. Fin. 42(2), 281 300. Iglehart, D. (1969). Diffusion approximations in collective risk theory. J. Appl. Prob. 6, 285-292. Jarrow, R. (1994). Derivative securities markets, market manipulation and option pricing theory. J. Fin. Quant. Anal. 29, 241-261. Jensen, B. and J. Nielsen (1996). Pricing by No-Arbitrage. In Time Series Models in Econometrics, Finance and other Fields (Eds., D. Cox, D. Hinkley and O. Barndorff-Nielsen), Chapman and Hall, London. Karatzas, I. (1997). Lectures in Mathematical Finance. American Mathematical Society, Providence. Karatzas, I. and S. Shreve (1988). Brownian Motion and Stochastic Calculus. Springer, Berlin. Karatzas, I. and S. Shreve (1998). Methods of Mathematical Finance. Springer, Berlin. Klugman, S., H. Panjer and G. Willmot (1988). Loss Models. Wiley, New York. Koller, M. (2000). Stochastische Modelle in der Lebensversicherung. Springer, Berlin. Kopp, P. (1984). Martingales and Stochastic Integrals. Cambridge University Press, Cambridge. Kramkov, D. (1996). Optional decomposition of supermartingales and hedging contingent claims in incomplete security markets. Prob. Theory Rel. Fields 105, 459-479. Lamberton, D. and B. Lapeyre (1996). Introduction to Stochastic Calculus Applied to Finance. Chapman and Hall, London. Lando, D. (1997). Modelling bonds and derivatives with credit risk. In Mathematics of Financial Derivatives (Eds., M. Dempster and S. Pliska), pp. 369 393. Cambridge University Press, Cambridge. Lundberg, F. (1903). Approximerad framstSllning av sannolikhetsfunktionen, ftterfSrsgikring av kollektivrisker. Akad. Afhandling. Almqvist och Wiksell, Uppsala. Lyons, T. (1995). Uncertain volatility and the risk-free synthesis of derivatives, Appl. Math. Fin. 2, 117-133. Mercurio, F. and W. Runggaldier (1993). Option pricing for jump-diffusions: Approximations and their interpretation, Math. Fin. 3, 191 200. Merton, R. (1973). The theory of rational option pricing. Bell J. Econ. Man. 7, 141 183. Merton, R. (1976). Option pricing when underlying stock returns are discontinuous. J. Fin. Econ. 3, 125-144.
412
Milbrodt, H. and M. Helbig (1999). Mathematische Methoden der Personenversicherung. de Gruyter, Berlin. Musiela, M. and M. Rutkowski (1997). Martingale Methods" in Financial Modelling. Applications of Mathematics. Springer, Berlin. Myeni, R. (1992). The pricing of the American option, Ann. Appl. Prob. 2, 1-23. Nelson, D. (1990). ARCH models as diffusion approximations. Y. Econ. 45, 7-38. Norberg, R. (1991). Reserves in life and pension insurance. Scand. Act. J. pp. 3-24. Norberg, R. (1995). A time-continuous Markov chain interest model with applications to insurance. J. Appl. Stoch. Mod. Data Anal. 11, 245-256. Norberg, R. (1999). Ruin problems with assets and liabilities of diffusion type. Stochastic Process. and Appl. 81, 255-269. Pagan, A. (1996). The Econometrics of financial markets, J. Emp. Fin. 3, 15-102. Panjer, H. and G. Willmot (1992). Insurance Risk Models. Society of Actuaries, Schaumburg, Illinois. Paulsen, J. (1993). Risk theory in a stochastic economic environment. Stoch. Proc. Appl. 46, 327-36l. Pham, H., T. Rheinl/inder and M. Schweizer (1998). Mean-variance hedging for continuous processes: New proofs and examples. Fin. & Stoch. 2, 173 198. Protter, P. (1992). Stochastic Integration and Differential Equations: A New Approach. Applications of Mathematics. Springer, Berlin. Resnick, S. (1992). Adventures in Stochastic Processes. Birkh/iuser, Boston. Revuz, D. and M. Yor (1994). Continuous Martingales" and Brownian Motion. Springer, Berlin. Rogers, L. and D. Williams (1987). Diffusions, Markov Processes, and Martingales Volume 2 Itd Calculus. Wiley, New York. Rogers, L. and D. Williams (1994). Difjusions, Markov Processes, and Martingales Volume l Foundations, 2rid edition. Wiley, New York. Rolski, T., H. Schmidli, V. Schmidt and J. Teugels (1998). Stochastic Processes for Insurance and Finance. Wiley, Chichester. Runggaldier, W. (ed.) (1997). Financial Mathematics Lecture Notes in Mathematics No 1656. C.I.M.E., Springer, Berlin. Samuelson, P. (1965). Rational theory of warrant pricing. Ind. Man. Rev. 6, 13 31. Schmock, U. (1998). Estimating the value of the Wincat coupons of the Winterthur Insurance convertible bond: A study of the model risk. ASTIN Bulletin. 29, 101 163. Schweizer, M. (1991). Option hedging for semimartingales, Stoch. Proc. Appl. 37, 339-363. Seal, H. (1978). Survival Probabilities. The Goal of Risk Theory. Wiley, Chichester. Sondermann, D, (1991). Reinsurance in arbitrage free markets. Ins. Math. & Econ. 10, 191-202. S~rensen, M. (1996). A semimartingate approach to some problems in risk theory. A STIN Bulletin, 26, 15-23. Stone, C. (1963). Weak convergence of stochastic processes defined on semi-infinite time intervals. Proc. Amer. Math. Soc. 14, 694-700. Taleb, N. (1996). Dynamic Hedging. Wiley, New York. Thorin, O. (1975). Stationarity aspects of the Sparre Andersen risk process and the corresponding ruin probabilities. Scand. Act. J. pp. 81-99. Venter, G. (1991). Premium calculation implications of reinsurance without arbitrage. A STIN Bulletin, 21, 223~30. Wiggins, J. B. (1987). Option Valuation under Stochastic Volatility, Theory and Empirical Estimates. J. Fin. Econ. 19, 351 372. Williams, D. (1991). Probability with Martingales. Cambridge University Press, Cambridge. Wolthuis, H. (1994). Life Insurance Mathematics' (The Markovian Model), CAIRE Education Series 2. CAIRE, Brussels.
D. N. Shanbhag and C. R. Rao, eds., Handbook of Statistics, Vol. 19 2001 ElsevierScienceB.V, All rights reserved.
l
I J
Renewal Theory
D. R. Grey
1. Introduction
In this chapter, an attempt is made to survey some of the most important developments in probability theory which may legitimately go under the heading Renewal Theory, although inevitably, because of the centrality of renewal ideas within the broader picture, a line of demarcation cannot nicely be drawn between this topic and others; in particular, there is a strong interplay with the theory of Markov chains and random walks. Moreover, in applied models the renewal theorem is often used as a tool in obtaining limit results, and although this is a powerful and important tool, it is the details of the model which are of major interest. Therefore, it would be unrealistic to attempt to survey all the applications without going into too much specialist detail, and a choice has had to be made of some of the simpler and more approachable examples of applications. An assumption of a reasonable grasp of probability theory and the associated mathematical analysis is made, but as far as possible the exposition is self-contained. Where proofs or details are omitted, references are usually given to more rigorous coverage elsewhere. The fundamental feature of a model to which renewal theory may be applied is the existence of points of regeneration, namely (random) points in time where the future behaviour of the process is independent of the past, and is stochastically identical to that at all other such points. The idea apparently originates from Palm (1943). The development of a coherent general abstract theory has followed from and been motivated by applications in, for example, telephone engineering and demography. A good account of early developments is given by Smith (1958). A key reference in the demographic area is Pollard (1973). Central to the theory, and indeed to the whole theory of stochastic processes, is the renewal theorem which states, in the simplest case of discrete time, aperiodicity and finite mean time between points of regeneration, that the probability un of regeneration at time n converges as n ~ ec to the reciprocal of that mean. There is an equivalent result (Blackwell's theorem) in continuous time. The renewal theorem is easy to state and intuitively appealing, but it was not until the 1970s that an attractive probabilistic method of proof emerged, using coupling 413
414
D. R. Grey
arguments, and it is likely that this method will have the last word on the question of proof. The renewal theorem forms the basis of Sections 2 (discrete time) and 3 (continuous time). The renewal theorem implies that many quantities arising in processes with points of regeneration (renewals) also converge as time tends to infinity, suggesting a kind of ultimate stability. These quantities as functions of time are solutions of renewal equations, They are discussed in Section 4, generally in continuous time, although there is an analogous discrete time theory. Many examples of solutions of renewal equations are given by Karlin and Taylor (1975). Section 5 gives a brief indication of how the power of the renewal theorem may be extended to what appear to be much more general processes, namely Markov renewal processes and semi-regenerative processes. Motivated by studying the properties of transition probabilities of continuous time Markov chains, Kingman (1972) formulated a different way of extending the notion of renewal from discrete to continuous time, and developed the theory of p-functions. Here, typically, renewal takes place over an interval rather than instantaneously. The theory is mainly of mathematical interest, and discussed in Section 6. Section 7 establishes the Poisson process as a universal limit when superimposing an increasing number of independent renewal processes with correspondingly decreasing densities. Sections 8 and 9 look at some applications. In Section 8 we encounter the notion of converting a defective or excessive renewal equation into a proper renewal equation by multiplying through by an exponential function; this leads to limiting results which follow from the renewal theorem, but which take the form of asymptotically exponential decay or growth, rather than convergence to a constant. The inventory model of Section 9 is of particular interest because renewal ideas enter in two different directions, both "time" and "state". The remaining three sections introduce some more recent developments in which the basic assumptions are relaxed in some way, but it is still possible to prove a result similar to one which would be provided by the renewal theorem. In Section 10, inspired by Berbee (1979), the inter-renewal times, instead of being independent, form a stationary ergodic sequence with an asymptotic independence known as the weak Bernoulli property. In Section 11, a process which is behaving only approximately like a random walk is shown, using Goldie's implicit renewal theory, to exhibit the same sort of asymptotic behaviour as the exact random walk of Section 8. Finally in Section 12, we show how Schmidli extends the well-known estimate of the probability of ruin in classical risk theory to a more general model.
2. Feller's recurrent events
A useful starting point for the study of renewal theory is the simplest possible case where the time scale is discrete, and at each time epoch n = 0, 1,2,... we are only interested in whether a renewal takes place: denote this event by E,. Then because
Renewal theory
415
of the axioms of probability, the behaviour of this whole sequence is determined by probabilities of the form P(E, 1 N E , 2 N . . . AE,k) for 0 <_ nl < n2 < ... < nk. The sequence (E,) is called a discrete renewalprocess (or, following Feller (1968), the phenomenon 6~ whose manifestation at time n is just E, is called a recurrent event) if the following axioms are satisfied. 1. P(Eo) = 1. 2. For any k > 2
P(Enl)P(En2_nl n . . ,
and 0 ~ f?E,k-nl).
nl < n2 < ' ' ' < nk,
P(Enl NEn2 N " " AEnk) =
The formality of these axioms disguises the fact that if we define (possibly improper) random variables X1,X2,. . by 321 = inf{n > 0 : En occurs} 322 = inf{n > X1 : En occurs} - X1 and so on, then X 1 , X 2 , . . . are independent, identically distributed (i.i.d.) random variables which may be interpreted as the lengths of time between successive renewals; conversely if we start off with any sequence of positive integer valued, possibly improper, i.i.d, random variables X 1 , X 2 , . . . , then O,X1,X1 + X 2 , . . . constitutes the times of occurrence of renewals in a renewal process. So we effectively have two different definitions of the same process, but we shall later see that they extend to continuous time in two different ways. Of particular importance in renewal theory are the two sequences (u,) and ( f , ) defined by u, = P(E,) f,
.
and
N E nc _ 1N
P .( E c . A E. c a
E,)
P(X1 = n)
Either of these two sequences determines the behaviour of the process; in particular ( f , ) gives the probability function of the common distribution of X1,)(2,..., and this will be proper if and only if Y-~,~I f , = 1. The characteristics of (un), which is called a renewal sequence, are less transparent and have been the subject of much study in their own right. By conditioning on the time of the first or last renewal before time n, these two sequences may be related by the equation Un = flun-1 q- fZUn-2 + ' ' " + fn-lUl + f , for n > 1 .
Expressed in terms of generating functions U(s) = ~ 0 uns" and F(s) = ~,~-1 f,s" (which will certainly exist for Is[ < 1), this relationship may be written
1
U(s) - 1 - F ( s )
"
EXAMPLE. Let f l = P > 0,f2 = q = 1 - p > 0 and f~ = 0 for n >_ 3. Then f ( s ) = p s + q s 2 and so U ( s ) = ( 1 - p s - q s 2 ) -1 which after factorization and expression in partial fractions becomes
416
D. R. Grey
U(s) = 1 + q
- -
Expanding as a power series,

1 ~- q (_q)~
1 - (_q)n+l
U,-l+q
l+q"
i -
l+q
Note in particular that u, -+ (1 + q)-I as 11 --+ ~ , which is a special case of the following. A fundamental result in this area, with profound consequences for the stability of a wide variety of processes, is the renewal theorem. Suppose that ~ f~ = 1, let d = h.c.f. {11 :fn > O} and let
oo
# = ~
nf~ <_ cx~
be the mean recurrence time. Then the renewal theorem is as follows.

Renewal theorem d #
U n d ---+ - -
aS
tl --~ O0
The analytical proof of this theorem, discovered by Erd6s et al. (1949), is unmemorable; there is also a rather less direct Fourier analytic proof given by Kingman (1972). These have been superseded in fashion by an alternative, more probabilistic proof involving a coupling argument. The idea of coupling is generally credited to Doeblin (1938); its application to the proof of the discrete renewal theorem was apparently first suggested by Breiman (1969) and fleshed out in the context of Markov chains by Pitman (1974) in the case # < ~ , and by Billingsley (1979) in the case # = cx~. We shall sketch this proof in the case # < ~ . Firstly, it is clear that there is no loss of generality in assuming d = 1. We need the notion of a delayed renewal process where, instead of behaving as if a renewal occurred at time 0 (implicit in the foregoing definition) the process starts with a renewal at a random time, the probability that it starts at n being denoted by bn. If in this process we denote by vn the probability of a renewal at time n, then in terms of generating functions B(s) and V(s) we get the obvious relationship
V(s) = B(s)U(s) - 1 - F(s)
In particular, if we make the choice
Renewal theory
417
(which may easily be seen to be the probability generating function of a proper probability distribution) then we get
1
v(s) -
/*(1 - s )
and so vn = / , - 1 for all n. It follows easily that this particular delayed renewal process is strictly stationary in the usual sense, and for this reason is called an
equilibrium renewal process.

Suppose we now take a (non-delayed) renewal process and an independent equilibrium renewal process, and denote events of renewals in them by (E,) and (E~) respectively. Let N be the (possibly improper) random variable denoting the first time n at which both E, and E~ occur (a "coupling"). Then
= P(Z
oo
) - P(E;)
= Z P(N = m) [P(EnIN = m) - P(E~ IN = m)]

m=O
=
m--n+1
P(N = rn)[P(E, IN = m) -P(E[N = m)]
using the fact that if a coupling has already occurred, the two processes are behaving probabilistically identically. Hence
Iu , - / * - l l = [ u n - v n l <_P(N>n) .
It suffices to show that P(N > n) -~ 0 as n --+ oc, in other words, that N is a proper random variable. This is not difficult but does require a digression into the theory of Markov chains. Briefly, if Z, and Z* denote the forward recurrence times at time n of the two processes, namely the lengths of time until the next renewals, then (Z,,Z~) performs a bivariate Markov chain which is irreducible and has equilibrium distribution /? x fi where fi is the equilibrium delay distribution derived above; it follows that it is positive recurrent and so eventually visits the state (0, 0), which constitutes a coupling. As a consequence of the renewal theorem, it is straightforward to show that many other quantities converge. For instance, if Zn again denotes the forward recurrence time at time n, then by partitioning according to the time of the last renewal before n we get
n-I
P(Z. = k) = E Umf.+k--,.
m=O
k+n = Z flUn+k-I
I=k+l
/* l=k+l
fl
as n ~ ~ by dominated convergence
418
D. R. Grey
and so the distribution of Z, converges, not surprisingly, to the equilibrium delay distribution. In the case # = ec, there is no such equilibrium process, since if we try to solve
B(s) - ~(1 - F(s) ) 1-s

for some ~z > O, letting s --+ 1 gives B(1) = oc, which will not do. So in this case the renewal theorem may be said to be less informative. Nevertheless, limit theorems of a different kind exist, of which the following is possibly the most interesting. I f we take the forward recurrence time Zn defined above, and the analogous backward recurrence time Y~, namely the length of time since the last renewal at time n, then the result concerns the convergence in distribution of the bivariate random variable (Y~/n, Zn/n). (This theorem is stated in discrete time for convenience only; there is a corresponding continuous time version: see, for instance, Bingham et al. (1987).) Note that the non-degeneracy of the limit distribution means that in the limit the backward and forward recurrence times are comparable in magnitude with the length of time which has elapsed.
Dynkin-Lamperti theorem (Yn/n, Zn/n) converges in distribution to the limit with bivariate density g~(y,z) =
c~sin =ct 1 ~r ( _ y ')l - ~ (y - + z "1+~
(0<y< 1 , z > 0)
for some ~ E ]0, 1 [, if and only if

a s n ---+ o o
where L is a function slowly varying at infinity. This regular variation of the tail of the inter-renewal time distribution is a condition which arises naturally in other contexts when considering distributions with heavy tails: for instance, it characterises the distributions in the domain of attraction of a stable law.
3. Theorems of Blackwell, Feller and Orey

It was suggested in the previous section that the concept of a renewal process may be extended to continuous time in two different ways. The first way we shall consider in this section; the other will form the basis of Section 6. We shall take the behaviour of the process to be determined by a sequence of (possibly improper) i.i.d, inter-renewal times )(1,X2,... but now assume that their c o m m o n distribution is non-lattice in the sense that there does not exist 2 > 0 such
Renewal theory
419
that P(X1 E 2 ' ) = 1 where S = {0, 2, 22, 3)~,...}. (It is clear that the lattice case essentially reduces to the discrete case.) We shall concentrate on the case of c o m m o n practical interest where the distribution of X1,X2,... is proper, and denote its distribution function by F and its mean by/~(<_ec). We also lose little by assuming that the inter-renewal times are strictly positive, or in other words that F(0) = 0. The renewal process can now be defined as the collection of r a n d o m variables (N(t) : t _> 0) where
N(t) --: sup{n : X1 +X2 + " - + A m _< t}

denotes the number of renewals up to and including time t. It is not hard to show that N(t) is almost surely finite and that its mean
OO
EN(t) = Z F * ~ ( t )
n~l
( ..... denoting convolution) is finite for all t >_ 0. It is usual to work with the closely related function
oo
M(t) = 1 + EN(t) = Z F*n(t)

n=0
where F * is defined as the distribution function of a random variable degenerate at the origin. Effectively this counts the renewal which has implicitly taken place at time 0. The function M(t) is of particular importance, and is known as the renewal function. F o r future reference we note that the renewal function
oo
Ct/=
n=0
may be defined for all real t and any distribution function F, not necessarily concentrated on the non-negative real numbers, although its value may or m a y not be finite. It may be interpreted as the expected number of visits of the sequence of partial sums So = 0, S1 = X1, $2 = X1 + X 2 , . . . to the half-line 1- o o , t], where X1,322,... are i.i.d, with distribution function F. (This sequence of partial sums is known as a random walk: see Section 8.) It is clear that, even if the distribution of inter-renewal times is discrete (which it could be: for instance concentrated on two positive numbers whose ratio is irrational) then the set of possible time points at which a renewal could occur if anything becomes more dense as time goes on, and yet we do not expect the frequency of renewals to increase; hence it is no longer appropriate to look for convergence of the probability of a renewal at a particular time to any limit other than zero. Rather we look at intervals of time, and it turns out that the behaviour of the expected number of renewals in an interval has far-reaching consequences. The following result is central.
420
D. R. Grey
Blackwell's theorem
For each h > 0,
M(t+h)-M(t)~-
ast~oe
This theorem may be seen as directly analogous to the discrete renewal theorem; indeed it is true in the lattice case provided only that h is carefully chosen. Note that M(t + h) - M(t) is just the expected number of renewals in the time interval
]t,t + h].
As might be expected, this theorem has particular force when/~ < ec (which is not to suggest that there are not interesting things to be said about M(t) when # = oc). Blackwell's (1948) original proof has been superseded, in the case # < ec, by one involving a coupling argument due to Lindvall (1977). A good account of this is given by Durrett (1996). The idea is the same as in discrete time except that now, typically, only approximate coupling can take place and a certain amount of technical trickery is needed in achieving the proof. It was noted earlier that the renewal function M is well-defined even if )(1 ,X2,. are not confined to be non-negative valued. The following theorem is an important generalisation, first proved by Feller and Orey (1961).
Blackwell- Feller-Orey theorem

Let X1,X2,... be i.i.d, with common distribution non-lattice with mean 0 < # _< oc. Then the corresponding renewal function M satisfies
M(t + h) - M(t) ~ ~ hi# ( o
as t--+ oc; as t --* - o c
for each fixed h > 0. More modern proofs of this theorem (as stated above, or equivalently stated in terms of solutions of renewal equations: see next section) may be found in Thorisson (1987), Alsmeyer (1991) or Kallenberg (1997). The next section will explore a little further the consequences of Blackwell's Theorem for the asymptotic behaviour of a wide variety of processes.
4. Renewal equations
Consider any model in which, starting at time zero, after some random time X1 the process reaches a so-called point of regeneration at which its future behaviour is independent of its past and is probabilistically identical to its behaviour starting at time zero. Let A(t) = EY(t) be the expected value of some random variable determined by the behaviour of the process at, and possibly after, time t: for instance, the case where A(t) is a probability is included, because we can take Y(t) to be an indicator function. Then by conditioning on X1 we get
Renewal theory
421
A(t) = E(E(Y(t)IX1))
/o
tA(t
u)dF(u) +
E(Y(t)[X1
u)dF(u)
where F is the distribution function of );-1, and we have used the regeneration property in the first integral above. If we denote the second integral above by a(t), which may be easy to evaluate in particular cases, we get the renewal equation
A(t) = a(t) +
A(t - u)dF(u)
which is a functional equation for A in terms of the supposedly known F and a. If a is measurable and bounded on bounded intervals, then it may be shown that the unique solution bounded on bounded intervals is given by
A(t) = (a * M)(t) =
/0
a(t - u)dM(u)
where M is the renewal function corresponding to F, and we adopt here and elsewhere the convention that the integral is over the closed interval [0, t], thereby including the jump in M which occurs at the origin. To see this briefly, note that, in terms of convolutions,
A =a+A*F
and by repeated substitution for A on the right hand side we may establish
N
A = a * E F * " + A * F *(N+I)
n=0
for any N, and then check that A .F*(U+l)(t) ~ 0 as N ~ cxz for each fixed t. The fact that the renewal function arises naturally in the solution of renewal equations is a major reason for its importance. The simplicity of form of the solution of the renewal equation disguises the fact that the renewal function may rarely be explicitly evaluated; however, Blackwell's theorem does at least enable us to look at the asymptotic behaviour of A(t). Roughly speaking, dM(u)~_ /z-1 du for large u, and so provided a(t) is negligible for large t we might expect
A(t) --~;
'f0
a(u)du
as t--~ oc .
This is true provided a satisfies a condition known as direct Riemann integrability, about which we shall say a little below. The truth of this statement is known as the key renewal theorem and is in fact equivalent to Blackwell's theorem. To show that the key renewal theorem implies Blackwell's theorem, simply take a(t) = l[0,h[(t), whence A(t) = M(t) - M ( t - h) for t > h. The proof of the
422
D. R. Grey
opposite implication gives us a clue to the condition of direct Riemann integrability. Call a an h-step function for h > 0 if it takes the form
oo
a(t) = Zanl[nh,(n+l)h[(t)
n=0
for t _> 0
for some constants a0, a l , . . . . It is easy enough to show that Blackwell's theorem implies the key renewal theorem when a is an h-step function which is integrable (namely ~ , ~ 0 la,[ < oc); a function is defined to be directly Riemann integrable if it can be sandwiched between two integrable h-step functions whose integrals are arbitrarily close, and a monotonicity argument suffices to extend the key renewal theorem to such functions. In practice, the condition of direct Riemann integrability has been found to be the minimal useful condition, and there are several alternative sufficient conditions for it to hold: see, for instance, Asmussen (1987). One particular sufficient condition is that a be monotonic and integrable. EXAMPLE. Suppose that F possesses a density function f which is itself directly Riemann integrable. Then the renewal equation
m(t) = f(t) +
/0'
m(t - u)dF(u)
has a unique solution bounded on bounded intervals, given by
m(t) =
/0'
f ( t - u)dM(u) .
Also for any t > 0
/0'
m(x)dx =
J0'{i
f ( x - u)dM(u) dx u)dx}dM(u)
= fo* { f * f ( x -
= --jotF(t - u)dM(u) .
But also by definition of M
M(t) = 1 + (F * M)(t) : 1 +
/0'
F ( t - ,)dM(u)
and so we have identified m as the density of M on the positive real line; it is known as the renewal density. In particular, by the renewal theorem,
z--
# as t ---+ oc, and so we have a stronger statement of Blackwell's theorem in this special case.
Renewal theory
423
EXAMPLE. This example is the continuous time analogue of the one given in Section 2. In a renewal process, let S~ =X1 + . . . +X~ so that the forward recurrence time at time t may be written
g(t) = SNItI+~ -- t .
For fixed z > 0, by conditioning on )(1 we obtain
P ( Z ( t ) < z) = P(t < X1 <_ t + z) + = F ( t + z) - F(t) +
P ( Z ( t - u) < z ) d F ( u ) P ( Z ( t - u) <_ z ) d F ( u ) .
./o
F o r each fixed z, this is a renewal equation with solution
P ( Z ( t ) < z) =
Moreover if # < oo then
[F(t + z - u) - F ( t - u)]dM(u) .
F ( t + z) - F(t) = [1 - F(t)] - [1 - e ( t + z)]

which is directly Riemann integrable, since both terms on the right hand side are monotonic and integrable; hence by the key renewal theorem, as t --+ o%
P ( Z ( t ) <_ z) -+ ~
lfo
[F(u + z) - F(u)]du
= ~ f o ~ [ 1 - F(u)]du .
This, now regarded as a function of z, gives the limiting distribution function of the forward recurrence time. It is of interest that this distribution always possesses a density function, whereas all that was assumed about F was that it is nonlattice. Again, if we allow X 1 , X 2 , . . . to take negative values but have positive mean so that {S~} is a random walk with positive drift, the BlackwelDFeller-Orey theorem allows us to deduce the existence of a limiting distribution of the overshoot of the r a n d o m walk over a positive barrier b as b + oc; see, for example, Asmussen (1987). Variants on Blackwell's theorem and the key renewal theorem are possible by strengthening or weakening assumptions and conclusions. Perhaps the most substantial of these arises from Stone's* decomposition of the renewal measure (Stone, 1966). This requires F to be spread out, namely having F *" with an absolutely continuous component for some n (a condition theoretically stronger than non-lattice but in practical terms equivalent). The decomposition is M = M1 + M2 where M1 and M2 are non-negative measures with M2 having finite total mass and M1 having a bounded continuous density which tends to #-1 at infinity. (Essentially, the absolutely continuous component becomes dominant.) I f F is spread out, it is an easy consequence of Stone's decomposition [for details, see
424
D. R. Grey
Asmussen (1987)] that a only needs to be bounded, integrable and tending to zero at infinity for the key renewal theorem to hold. A further strengthening under these conditions is given by Arjas et al. (1978), who prove that if g is a nonnegative function which is bounded, Lebesgue integrable and tends to zero at infinity then lim sup (G M , a)(t) - #-1
a <g
t--~
a(u)du = 0
for any distribution function G (which represents a delay distribution in the underlying renewal process).
5. Markov renewal and semi-regenerative processes To illustrate the far-reaching consequences of the renewal theorem, we describe here a much more general model for which it is now not too difficult to demonstrate stable long-term behaviour. A standard reference is the book of ~inlar (1975); for more detail than we include here, see also Asmussen (1987). Let (am : n = 0, 1 , 2 . . . ) be a positive recurrent irreducible Markov chain with countable state space S, and conditional o n J o , J 1 , J 2 , . . . let Xo,X1,X2,... be independent non-negative random variables such that the distribution of X, depends only on the two states J, and J,+l. Then X~, is intended to represent the length of time the process stays in state J, before jumping to state J,+l. Such a process is called a Markov renewalprocess or semi-Markov process and is already seen to admit several special cases. The key to the analysis of Markov renewal processes is that the times of new entry into a fixed state i C S form a (possibly delayed) ordinary renewal process. Assume for simplicity that the inter-renewal times in this process have a nonlattice distribution (in which case the same will be true whichever i E S we choose). Then by developing an appropriate renewal equation it is possible to prove that many quantities converge as t --+ oc; for instance, the probability of being in state i at time t converges to
vi ~-~j~s qij#ij
~kcs v~~j~s qkj#kj

(assuming that the sums converge), where (qij) and (vi) are respectively the transition probabilities and equilibrium probabilities of the Markov chain (J~) and #ij is the expected value of X~ conditional on J~ = i and J~+l = j. We can be more general still than this. Let (Y(t) : t _> 0) be a continuous time process with the following structure: there exists a Markov renewal process such that conditional on X0,... ,Xn-a and J 0 , . . . ,Jn-1 and J~ = i, the distribution of the process
( r ( X o + . . . + x~_l + t) : t > o)
Renewal theory
425
is the same as that of the process (Y(t) : t _> 0) conditional on J0 = i, for all i E S. Such a process is called semi-regenerative (it being merely regenerative if the M a r k o v chain has only one state). Again it is possible to develop renewal equations based on the embedded renewal process of returns to a particular fixed state, and to deduce corresponding limiting results. One such (again assuming the non-lattice case) is that Y(t) converges in distribution as t ~ ec in the sense that
Ef(r(t)) Lz vf j
# jES
for all bounded continuous f . Here, # = ~-~kcs vk ~ j ~ s qkj#kj and Ej denotes expectation conditional on J0 = J.
6. Kingman's regenerative phenomena

The continuous time renewal process studied in Section 3 takes renewals as instantaneous and assumes that the lengths of time between successive renewals are independent and identically distributed. An obvious alternative is to regard a process as being either in a state of renewal or not at each time point t, and to translate Feller's definition of a recurrent event (Section 2) directly into continuous time. So, if we let Et be the event that the process is in a state of renewal at time t, we have the axioms 2. For
1. P(Eo) = 1. any
k >_ 2
and
0 <_ tl < t2 < ... < tlc,
P(Et~ NEt2 n . . . NEtk) =
P ( E ' I )P(Et 2 11 ~ """ ~ Etk--tl )"
If o ~ is the state of renewal whose occurrence at time t is denoted by Et satisfying the above axioms, Kingman (1972) calls ~ a regenerative phenomenon. A good example of a regenerative phenomenon is the emptiness of a queue in any model with Poisson arrivals. The analogue of the renewal sequence (un) in discrete time, namely the function
p(t) = P(Et)
fort>0
is called a p-function, and it is not hard to see that this function determines the behaviour of the regenerative phenomenon. Much of the interest in p-functions has been in looking at their mathematical properties. Typical curiosities are as follows.
p(t) = 1 if t is rational, p(t) = 0 if t is irrational; p(t) = e ).t for t <_ h,p(t) = e -~h for t > h, for some fixed 2, h > 0.
The latter arises from a simple model: take a Poisson process with rate 2 starting at time 0, and say that Et occurs if there are no occurrences in the time interval
426
D. R. Grey
It - h, t]. It is easy to check that this defines a regenerative phenomenon, and that the p-function is as given. In discrete time it is possible, at least via generating functions, to identify the most general renewal sequence relatively easily, but in continuous time life is more complicated. The first (pathological) example above suggests that a p-function cannot be characterized by an integral transform, because for this p-function it would be identically zero. The usual procedure is to define a standard p-function as one for which p(t) --+ 1 as t --+ 0. Under this condition, using some simple inequalities based on the axioms, it follows that p(t) is always positive and is uniformly continuous in t. Then Kingman's formula gives the general standard p-function as having Laplace transform given by
r(s) = [ _
ooe-Stp(t) at = { s - fl0 ~ (1 - + e ~x)#(dx) }

,oc]
d0
where # is a measure on ]0, oc] satisfying (1 - e

,oc]
x)#(dx) < ec
The most obvious example of a regenerative phenomenon (of which the above queueing model is one case) is a process in which periods of renewal alternate with periods of non-renewal, these periods all being independent, with the renewal periods having exponential distribution with parameter q say, and the nonrenewal periods having some arbitrary, possibly improper, distribution function F. In this case, the measure # can be identified by #(]0, oc]) = q and
#(]0,x]) =
qF(x)
for x > 0
EXAMPLE. Suppose that the non-renewal periods also have an exponential distribution, with parameter c say. Then #(dx) = qce-CXdx and #({oc}) = 0. Hence
r(s) -1 = s +
(1 -
e-SX)qce-cx dx
which after integration, rearrangement and expression in partial fractions gives r(s)= 1 ~
7 s-~(c q+c+sq- )
whence by inverting the Laplace transform
p(t) = ~l
(c + qe-(q+c)t) fort>O_ .
Renewal theory
427
[This result could also have been derived by considering the process as a two-state M a r k o v chain.] The pathological example given earlier indicates that we cannot state that p(t) converges to a limit as t ---, oc without imposing conditions under which we would expect a renewal theorem to hold. Again if we restrict attention to standard p-functions, for each h > 0 the skeleton ( p ( n h ) : n = 0 , 1 , 2 , . . . ) is a positive renewal sequence, and so by the discrete renewal theorem converges to a limit as n ---+oc; uniform continuity now ensures that this limit does not depend upon h and is the limit o f p ( t ) as t --+ oe. Depending on the model, it may not be easy to identify this limit; however, in the case of alternating renewal and non-renewal periods, it is not hard using Kingman's formula to show that
EX p(t) ---+E X E Y
as t ---+oc
where X denotes a renewal period and Y denotes a non-renewal period. [This also follows from a more general result in which X is not necessarily exponentially distributed nor independent of the following Y, using the starts of renewal periods as renewals in the sense of Section 4 and developing a renewal equation for p(t).]
7. Superposition of renewal processes

The homogeneous Poisson process with rate 2 may be regarded as a renewal process in which the inter-renewal times have exponential distribution with parameter 2. Because of the "lack of m e m o r y " property of the exponential distribution, this means that the Poisson process enjoys m a n y special properties not enjoyed by other renewal processes: for instance, that it is already in equilibrium, and that the numbers of renewals in disjoint time intervals are independent r a n d o m variables. A rather deeper characterisation of the Poisson process arises from taking a large number of independent sparse renewal processes and combining all their renewals together. The following limit theorem accords much the same status to the Poisson process as the central limit theorem does to the normal distribution. We start with some preliminaries. For each n = 1 , 2 , . . . let (Nnl), (Nn2),-.., (Nnn) be independent renewal processes, the inter-renewal times of (Nni) having distribution function Fni for i = 1 , 2 , . . . ,n. For t _> 0 let
n
N.(t) = Z N n i ( t )
i=1
O f course (Nn(t)) will not in general be a renewal process. THEOREM. Under the above conditions, if for some 2 > 0 and all t > 0 max Fni(t) -+ 0
1<i<n
428 and 11
i=1
D. R. Grey
as n--+oc, then for any 0 = t o < t ~

N,,(tl),Nn(t2)-Nn(q),...,Nn(tk)--Nn(tk-1)
<tz<...<tk the joint distribution of converges to that of independent Poisson random variables with parameters 2tl, 2(t2 - q ) , . . . , 2(tk - tk-1) respectively. PROOF. Regard N~ (q), N, (t2) - Nn (tl),. , Nn (tk) - N~ (tk-1) as a r a n d o m k-vector and write it as ~i'z_~ W~i where W~i is the vector contribution from the process (Nn~(t)). Denoting the zero vector by 0 and the vector with 1 in position j and zeroes everywhere else by ej, let
p . i = P(W,,i = o) = 1 - F.~(t~)
and for j = 1 , 2 , . . . k let

q~ij = P ( W ~ i = ej) .
Also let
k
rni = 1 - P,i - )_. qnq

j=l
Then from the estimation

[F.i(tj) - ~(tj_l)][1 - F~(tk)] < qn~j < F ~ ( t j ) - F ~ ( t s _ l )
and the conditions of the theorem we get that

~ - ~ q ~ q -+ ;~(ts - tj-1)
i=l
as n -+ ~ .
Also
~.~ = Fg.2(tk) <_ F.~(tk) 2
and so the conditions of the theorem ensure that ~-"~Fni --~ 0

i=1 as / --+ OG .
Hence W~ := ~ i ~
W~i has k-variate probability generating function
Renewal theory
429
estimated by
i -~i=1 \
qnijSj
<-- E(s ~) <_

i=1 \
ni ~j=l
qnijSj -}- rni
for 0 < sj _< 1,j = 1 , 2 , . . . , k. The above results and elementary analysis allow us to conclude that
k
E(s W') --+ exp ~

j=l
~(tj - tj_l)(S j
-- 1)
as n --+ oc, which is the desired result. An obvious choice in the above theorem is to take F~i(t) = G(n-lt) for some fixed distribution function G; then the conditions are satisfied provided only that G(0) = 0 and G'(0) exists and is equal to 2.
8. Applications to random walks and branching processes

A r a n d o m walk is defined most simply as a process {So, &, and
$2,...}
where So = 0
S~=XI+X2+...+X~
for n = 1 , 2 , . . .
with X1,)(2,... a sequence of independent, identically distributed r a n d o m variables. Note that if these r a n d o m variables (called increments) are positive then the values of $1,$2,... can be thought of as the times of successive renewals in a renewal process. Usually, however, a r a n d o m walk is envisaged as having increments which m a y be either positive or negative. There is a vast literature on the behaviour of r a n d o m walks. One of the most fundamental results is that, excluding the trivial case of increments which are identically zero, one of the following three events occurs with probability one as
n --+ o ( :
1. Sn --+ +oc; 2. Sn -+ - o c ; 3. lim s u p & = +oc, lira inf & = - e c . If the increments have finite mean #, then the above three possibilities correspond to the cases # > 0, # < 0 and # = 0 respectively. Suppose we look at the first of these cases. Then the first ascending ladder epoch and first ascending ladder height, defined respectively by
I71 =
inf{n : & > 0}
and
H1 = S'C1
430
D, R. Grey
will be well-defined r a n d o m variables. H1 is positive by definition; denote its distribution function by F, and assume that its distribution is non-lattice (which will be so if the distribution of the increments is non-lattice). For some fixed h > 0, let P(t) denote the probability that the r a n d o m walk ever enters the interval ]t, t + h], namely
P(t) = P ( & E It, t + h] for some n) = P(E(t)) say .

Then
P(t) = P(E(t) nil1 > t + k) + P(HI EJt, t + hi) + P(E(t) n i l 1 < t) .

N o w since the increments are i.i.d., the random walk starts as if from scratch at level//1 at time "q, and has never reached this level or above before; hence if //1 = u _< t say, then the probability of E(t) occurring is the unconditional probability of E(t - u) occurring. So we get the renewal equation
P(t) = a(t) +
where
P(t - u)dF(u)
a(t) = P(E(t) NH1 > t + h) + P(H1 E ]t,t + k]) .

The key renewal theorem now suggests that P(t) converges to a limit as t + oc, and a few more details, namely showing that H1 has finite mean and a(t) is directly Riemann integrable, confirm this. Suppose we now consider the second case (# < 0) where the first ascending ladder height may not be well-defined. In fact, it must have a defective distribution, since if there were probability one that the random walk reached a positive level, then with probability one it would later reach a higher level, and so on, contradicting S, --+ - o c . Nevertheless, for t > 0 we still have the equation
P(t) = a(t) +
/0'
P(t - u)dF(u)
where F is now a defective distribution function. Such a defective renewal equation may often be transformed into an ordinary renewal equation in the following way. Suppose that there exists positive c~ such that f0 C"dF(u) = 1 . Then, writing dF* (u) = e ~u dF(u) and
P*(t) = e~'P(t)
Renewal theory
the above equation, after multiplying through by C t, becomes
431
P*(t) = Cta(t) +
/0
P * ( t - u)dF*(u)
which is now an ordinary renewal equation. If it can now be established that the distribution represented by F* has finite mean and that Cta(t) is directly Riemann integrable then it follows from the key renewal theorem that P* (t) converges to a positive limit c as t --+ ec, or in other words that
P(t) ~ ce -~t
ast--+oc .
Thus we have a precise estimate of the rate of decay of the probability that the random walk ever reaches a high level. A similar technique to the above occurs in the context of branching processes, of which we take the general Crump Mode-Jagers process as an example: this process is discussed in Jagers (1975). In this case, we convert a renewal equation involving an excessive distribution function (one whose total mass is greater than one) into a proper renewal equation. Suppose that starting at time t = 0 a single ancestor is born and then from time to time during its life (of length L) it gives birth to offspring according to some given random point process on the continuous time scale. Each of these offspring, starting at its birth time, then behaves independently in a stochastically identical manner to the parent, and so on. Let re(t) be the expected number of offspring born to an individual up to and including age t, and suppose that the process is supercritieal in that the expected total number of offspring m(ec) is greater than one. Let M(t) be the expected total number of individuals alive at time t. Then it is easy to see that the equation
M(t) = P(L > t) +
/0'
M ( t - u)dm(u)
holds, and that this is an excessive renewal equation. Suppose that there exists a > 0 such that f0 ~ e-~U din(u) = 1 .
Then multiplying the above equation through by e -~t gives an ordinary renewal equation in which the driving function e-~tP(L > t) is certainly directly Riemann integrable and therefore provided only that .~oo ue-~U dm(u) < oo we have, using the key renewal theorem as in the previous example, that for some constant c > 0,
M(t) ~ cC t
ast--~oc .
432
D. R. Grey
This shows that, in mean, the population grows exponentially, and the important value e is called the Malthusian parameter of the process. Many other limiting results in the theory of branching processes follow from a similar technique.
9. An inventory model
The following model is perhaps the simplest one for inventory control over a period of time; it was apparently first analysed in Arrow et al. (1958). Suppose that stock level of a single commodity is monitored at regular intervals, and let Yn be the level of stock at time n. This is treated as a real variable, a negative value being interpreted as a backlogged demand which will be fulfilled as soon as new stock is available. F o r S > s we define the (s, S) policy as that whereby if Y,, < s then stock is re-ordered to bring stock level up to S, the new stock being assumed to arrive immediately; whereas if Y, _> s then no new stock is re-ordered. The demands for stock between different stocktakings are assumed to be i.i.d, nonnegative random variables )(1 ,)(2,... with known distribution function F. Thus, assuming some starting stock level Y0, we have the recursion
Yn+l
j" Yn -Ynl
l S -X,,+I
if Yn > s; if Yn < s .
It may be shown that under certain reasonable assumptions and cost criteria, there exist S and s such that the (s, S) policy is optimal: see, for example, Whittle (1982). It will be more convenient to work with the non-negative r a n d o m variable Z,, := S - Yn, interpreted as the deficit of stock level below S. We refer to Z, as the state of the system at time n. It is easy to see how the process (Z,) behaves: it performs a non-decreasing random walk with increments Xn until it exceeds the value q := S - s, after which it is instantaneously re-started at zero, and so on. The times at which such re-start occurs constitute a (possibly delayed) discrete renewal process; and in addition if we replace "time" by "state" then an excursion between two re-starts is a (generally) continuous renewal process stopped at the first occasion on which it exceeds q. We shall show using these two renewal facets that Z~ converges in distribution as n ~ oo and that the limit distribution may be identified reasonably explicitly. We shall assume that Z0 = 0, so that re-starts constitute a non-delayed renewal process, Once we have established our results, it should be obvious that the limiting behaviour does not depend upon the value of Z0. For n = 1 , 2 , . . . and x > 0, define
Co(x) = P(Z~ < x); f , = P(Z~ < q,..

and
Z~_~ _< q,Z~ > q);
gn(X) =P(Zl ~ q, ..,Z~-I <q,Z~ <_x) .
Renewal theory
433
F i r s t fix x _> 0, a n d for ease o f n o t a t i o n suppress the d e p e n d e n c e on x. T h e n b y c o n d i t i o n i n g on the time o f the first r e - s t a r t we get
n--I
Gn = g~ + ~--~fmGn-m
m=l
H e n c e Gn satisfies a discrete r e n e w a l e q u a t i o n , a n d f r o m a discrete version o f the key renewal t h e o r e m we c o n c l u d e t h a t

1 oo
Gn --~-~ ~ l
where/~ = ~m%l
OO
asn--+ec
mfm. N o w if 0 < x < q, gm (X) = F *m(X) a n d so

1
gm(x) = M(x)
m=l
where M is the renewal f u n c t i o n c o r r e s p o n d i n g to F . Also,
# = e(N(q) + 1) = M(q)
where N(q) + 1 is the n u m b e r o f the first renewal whose " s t a t e " exceeds q in a r e n e w a l process with d i s t r i b u t i o n f u n c t i o n F ; in p a r t i c u l a r , /~ is finite as n o t e d earlier. So we m a y rewrite the a b o v e result as
G,,(x)
M(x)- 1 --+ M(q)
as n + ~
A l t e r n a t i v e l y if x > q,
gin(X) -- gin(q) = P(Z1 <_ q,...,Zm-1 <_ q,q < Zm <_x)

a n d so
OO O(3
~ ( g m ( x ) - gin(q)) = Z
m=l m=l
gin(x) - M ( q ) + 1 = H(x - q)
where H is the d i s t r i b u t i o n f u n c t i o n o f the f o r w a r d recurrence " s t a t e " at q o f the same renewal process. H e n c e in this case we have
Gn(x)-+M(q)-lH(x-q) M(q)
asn-+oc
P u t t i n g these results together, we have s h o w n t h a t Zn converges in d i s t r i b u t i o n as n ~ oc a n d the limit d i s t r i b u t i o n f u n c t i o n is given b y
G(x)
S (M(q))-I(M(x) - 1) (M(q)) l(M(q) 1 + H(x - q))
for 0 < x < q; for x > q .
434
D. R. Grey
It should be noted that although the form of H is not obvious, it too was found in terms of F and M by solving an appropriate renewal equation in the second example of Section 4.
10. Stationary increments

So far we have made strong independence assumptions about the times between renewals in order to develop the theory surrounding the renewal theorem. It is natural, however, to ask to what extent the limiting behaviour persists when we weaken these assumptions. What can we say, for instance, about the case where X1,X2,... form a stationary, ergodic sequence? In this section we approach a renewal process from another perspective, namely that of point processes. If for any Borel subset B of the non-negative real line we let N(B) denote the number of renewals occurring in B (so that what we previously called N(t) becomes N(10 , t])) then N is a random element out of the set of all non-negative integer valued measures on N+ which are finite on finite intervals. Denote this set by X , and equip it with the sigma-field ~ generated by all sets of the form {N E Y : N(B) = i} for Borel B and non-negative integer i. So the renewal process becomes a random variable in ( ~ , ~). In the case of i.i.d, inter-renewal times studied so far, if their common distribution has finite mean it is not hard to see from the renewal theorem that as we shift the time origin indefinitely into the future, the whole process converges in some sense to the stationary delayed renewal process; this is because the forward recurrence time converges in distribution, and the behaviour of the process after that is independent of what went before. This convergence is conveniently expressed in terms of point processes. For any point process N and real t > 0 define the shift operator Tt on Jg" by
TtM(B) = M(B + t)
for all Borel B .
If we now denote a renewal process started at time zero by No and we let N be a corresponding stationary delayed renewal process, then the above result may be written TtNo --~ N as t --+ oc .
The mode of convergence (among several possible ones) which we shall consider here is that the corresponding probability measures PNo and P satisfy PNo ~ P in the topology determined by the following metric. For any probability measure Q on ( ~ 2 , ~ ) define TtQ by TtQ(D)= Q(Tt-ID) for D E 9 ; for e > 0 define the e-smoothing of Q by 0
S~Q(D) = e 1
~0~
TtQ(D)dt
for D E
and then define the distance between any two probability measures Q1 and Q2 as
Renewal theory
OO
435
d(Q1, Q2) = ~
n=l
2-n]]s1/,(Q1) - $1/,(Q2)]]
where L]'][ denotes total variation norm. It can then be said that the above convergence takes place provided the distribution of inter-renewal times is non-lattice; the lattice case is simpler. The renewal theorem in its above form expresses the idea that as time goes on, the process forgets that it started with a renewal at time zero. It is not necessary for the inter-renewal times to be i.i.d, for this to happen, however. Suppose that we merely assume that X1,X2,... is a stationary ergodic sequence. This condition is in itself sufficient for the renewal theorem to hold, but only in the following weak Cesaro sense: for arbitrary D E ~ ,
t ~ t Jo ZD(T~N)ds = Q(D) := EX,
Jo
ZD(T~N)ds
where Q is the probability measure of a stationary ergodic point process N say. This result follows relatively straightforwardly from the ergodic theorem (Berbee, 1979; Proposition 3.1.1). It should be noted that if any stronger form of convergence is to take place, then the limit process must have distribution Q, and it is good to have an explicit expression for this. The additional condition required is a form of asymptotic independence in the sequence XI,X2,... usually known as the weak Bernoulli property. This may be written
ItPxxuiL+,) - Pxx Pxrll -+ 0
as n -+
for all sets K and L of positive integers with K finite, where Pxx denotes the joint distribution of the Xk for k G K, and similarly for the other terms. This property seems to be the natural one because it is just sufficient to ensure the successful coupling of two processes distributed as No and N respectively, and so enables a proof (Berbee, 1979; Section 6.3) of the renewal theorem to follow the same lines as the one already mentioned in the case of i.i.d, inter-renewal times. An obvious example of a stationary ergodic process with the weak Bernoulli property is a positive recurrent, irreducible, aperiodic Markov chain in equilibrium. A rather different approach is taken by Lalley (1986) who proves that the overshoot process of a random walk with stationary increments and positive mean increment over a barrier b converges in distribution as b ~ oc and is asymptotically independent of the initial conditions. In order to do this, Lalley assumes that the increments {~,} have the special structure (which includes many special cases)
3. = ~(... ,x,_~,X,,X~+l,...)
where ... ,X,-1,Xn,X,+I,... are i.i.d, and ~ is a fixed function of double-ended sequences; also ~ is such that a kind of asymptotic independence property known as fading memory holds for the {~,}. The proof uses a coupling method which exploits the i.i.d, nature of the sequence {X~}.
436
D. R. Grey
The difficulty with stationary processes is the level of generality at which to work in balancing the restrictiveness of assumptions with the strength of conclusions.
11. Goldie's implicit renewal theory

Let {S,} be a random walk whose increments have distribution function F and finite mean # < 0. For each n let M~ = max(S0, S1,...,S~) . Then since the random walk drifts to - o c almost surely, there will be a limit random variable M _> 0 such that M, is eventually equal to M, almost surely. Using the technique of Section 8 of finding a defective renewal equation for P ( M > t) it may be shown that
P(M > t) ~ ce -t
ast---+oo
provided 0 > 0 exists such that
f
and
~ e Ox d F ( x )
C X 3
o~ x e Ox d F ( x )
<
oo
Suppose we now look at the corresponding random walk with reflecting barrier at zero, {R, }, defined inductively by R0 = 0 and
Rn = max(0,Rn_l +AT,)
for n = 1,2,...
with the increments X, as before. Lindley's duality principle observes that M, = max(0,Xi,X1 + X 2 , . . . ,XI + . . . +Xn) whereas (as it is not hard to show) R~ = max(0,X~,X~_l + X ~ , . . ,X1 + . . . + X , ) so that, since X1 ,X2, . are i.i.d., M, and R, have the same distribution, for each n. So since M, converges to M almost surely, R, converges in distribution to M. Typically, a random walk with negative mean increment and a reflecting barrier at zero has a limiting and equilibrium distribution whose upper tail is asymptotically exponential. It is this aspect of renewal theory and its ramifications which Goldie (1991) develops in his implicit renewal theory. He expresses things in terms of multipli-
Renewal theory
437
cative rather than additive increments, allowing the possibility of changes of sign and two-tailed behaviour, but we shall forfeit this generality in favour of continuity with what has gone before. The idea here is that the process {Rn} need only behave asymptotically like a r a n d o m walk while it is among large values, in order to have a limiting/equilibrium distribution which is asymptotically exponential. More specifically, we look at a process defined recursively by Rn = 7J,(Rn-1) for n = 1 , 2 , . . .
where R0 is given some arbitrary distribution and T], 7~2,... are i.i.d, non-negative r a n d o m functions independent of R0, the characteristic of their behaviour being that 7J(t) -~ t + X for large t
where X is a r a n d o m variable with negative mean. Under these conditions there exists a limit distribution for Rn such that if R is a r a n d o m variable with this distribution then the r a n d o m functional equation
is satisfied, where on the right hand side, T is independent of R.
Goldie's theorem
Suppose there exist R, 7J, X such that R and (7~,X) are independent, the above equation holds, and X satisfies E(e x) = 1 and m := E(Xe x) < oc for some 0 > 0. Suppose also that Ele ~(R) - e(R+x) I < oc . Then
P(R > t) ~ ee -~
where
as t ~ ee
c = l~mE(eOe(R ) _ eO(R+x)) .
Note that the moment generating function Mx(2) = E(e ~x) is convex and so Mx(O) = 1 and the above condition Mx(O) = 1 imply that E X = M~((0) < 0 and that m = M~(O) > 0. The unusual thing about this theorem is that its substantial condition relating the behaviour of T(R) to that o f R + X appears to be assuming something about the tail behaviour of the distribution of R, the very content of the conclusion of the theorem; it turns out in applications, however, that this condition may be checked for some obvious choice of X without already knowing that the theorem is true.
438
D. R. Grey
EXAMPLE. (Grey and Lu, 1993) In the study of the Smith-Wilkinson random environment branching process, in which environments for different generations are i.i.d, and the environment for each generation determines its family size distribution, we find that q(~), the probability of ultimate extinction starting with a single individual and conditioned on a whole environment sequence = (C0, ~1,...), has a distribution which satisfies the random functional equation q(~) ~ ~b(q(~)) where ~b~ is a typical offspring p.g.f., independent of ~. If we make the transformation R = -loge(1 - q(~)) then we get the equivalent formulation
where ~P(t) = -loge(1 - qS~(1 - e-t)) . The point of doing this is that for large t,
=- l e ('
~ t-- 1OgeqS{(1)
J - loge(l- (1-
SO that 7' behaves in the right sort of fashion. Of course we are only interested in the supercriticaI case where P(q(~) < 1) < 1; the usual sufficient conditions for this are that E(log e ~b}(1)) > 0 and that E ( - loge(1 - q5(0))) < c~. Grey and Lu (1993) show that if we impose the following additional regularity conditions, that there exist 0 > 0 such that E~b{(1)- = 1,E(qS}(1) loge qS}(1))+ < ~ and EqS}r(1)(1 -q5(0)) --1 < c~, then all the conditions of Goldie's theorem are satisfied, and so we may conclude that P(R > t) ~ ce -t ast--+
for some c > 0. This result may now be transformed back into the language of the original problem; in particular, if we denote the unconditional probability of ultimate extinction starting with k ancestors by
=
then the behaviour of q~ for large k is governed by that of the distribution of q(~) close to 1; the precise result is that qk~cF(O+ t)k - ask-~.
Goldie (1991) gives some more examples.
Renewal theory
439
12. Schmidli's extension
The following results may be regarded as an extension of those in Section 3 to the solution of a perturbed renewal equation. It is given, and rigorously proved, by Schmidli (1997); we use his notation. In particular, cadlag means "right continuous with left hand limits". We shall state the theorem in the non-lattice case only.
Schmidli's theorem
Consider the equation
Z(u) = z(u) +
Z(u -
y)(1 - p(u,y))de(y)
where B is a proper non-lattice distribution function with B(0) = 0, z is bounded, 0 <_p(u,y) < 1 and p(u,y) is continuous in u. Then there exists a unique solution Z which is bounded on bounded intervals. If z is non-negative then so is Z; if z is continuous then Z is cadlag. If in addition z and the function
o"P(u,y)dB(y)
are both directly Riemann integrable, then Z(u) converges to a finite limit as
U ---+ OO.
The message of this theorem is that if the perturbing function p is sufficiently well behaved then its presence makes little difference to the qualitative behaviour of the solution. It may be too early days to suggest the possible applications which this result might have, but the following is given by Schmidli (1997) as a motivation for deriving it. The Bj6rk-Grandell risk model is a generalization of the classical risk model N,
Ct = u + ct - ~
i=l
Yii
where Ct represents the assets of an insurance company at time t, u is the initial assets, ct is the steady income from premiums, Nt is the number of claims up to time t, and I11,112,... are the successive sizes of these claims. In the classical model, (Nt) is taken to be a homogeneous Poisson process and Y1,112,... are i.i.d, and independent of (N,). In the Bj6rk-Grandell model, the rate of the Poisson process is varying randomly in the following way: there are i.i.d, random pairs ((Li, o-i); i = 1 , 2 , . . . ) such that for consecutive periods of time of lengths ~1, o-2,.., the rates of the Poisson process a r e L 1 , L 2 , . respectively. Interest focuses on the probability 0(u) that the company will ultimately be ruined. If the process is observed only at time points 0, a l , a l + a2,.., then we have a random walk and so, in order that 0(u) be not identically one, we need this
440
D.R. Grey
random walk to have positive drift, which in this context amounts to the condition
cu( ) > E(L )E(Y)

Let =inf{t:Ct<0}<ec and
~'1 =
inf{o-1 + . . . + o-1. : C~+...+~j < u} < oo
so that z is the time of ruin and Zl is the first time the random walk mentioned above reaches below its initial level. Let B be the defective distribution function of u - Cq. Then by conditioning on u - C~Z we get
@(U) ~- P('c ~ "Cl,"c < oo) -}-
/0
@(. - x ) ( 1 - p ( u , x ) ) d B ( x )
where p ( u , x) is the probability that ruin has already occurred before ~l, given that C~l = u - x . This equation lends itself to Schmidli's theorem once the defective distribution has been converted into a proper one by multiplying through by eau for some R > 0; the correct R is identified as satisfying
E(exp{a(u - C~rl)}) = 1 .
After making some mild regularity assumptions and checking that the conditions of the theorem hold, Schmidli establishes that 0(u) ~ Ce -eu as u --+ oc
for some C > 0, a result which generalizes one already well known in the classical risk model.
References
A1smeyer, G. (1991). Erneuerungstheorie, Teubner. Arjas, E., E. Nummelin and R. L. Tweedie (1978). Uniform limit theorems for non-singular renewal and Markov renewal processes, J. Appl. Prob. 15, 112-125. Arrow, K. J., S. Karlin and H. Scarf (1958). Mathematical Studies in Inventory Production Processes. Stanford University Press, Stanford. Asmussen, S. (1987). Applied Probability and Queues. Wiley, New York. Berbee, H. C. P. (1979). Random Walks with Stationary Increments and Renewal Theory. Mathematical Centre Tracts 112, Amsterdam. Billingsley, P. (1979). Probability and Measure. Wiley, New York. Bingham, N. H., C. M. Goldie and J. L. Teugels (1987). Regular Variation. Cambridge University Press, Cambridge. BlackwelI, D. (1948). A renewal theorem. Duke Math. J. 15, 145 150. Breiman, L. (1969). Probability and Stochastic Processes: With a View towards Applications. Houghton Mifflin, Boston.
Renewal theory
441
~inlar, E. (1975). Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs. Doeblin, W. (1938). Expos6 de la th6orie des cha~nes simples constantes de Markov ~t un nombre fini d'6tats. Rev. Math. Union Interbalkanique 2, 77-105. Durrett, R. (1996). Probability: Theory and Examples. Duxbury Press, Belmont, CA. ErdSs, P., W. Feller and H. Pollard (1949). A property of power series with positive coefficients. Bull. Amer. Math. Soe. 55, 201304. Feller, W. (t968). An Introduction to Probability Theory and its Applications. Vol. I, 3rd edn. Wiley, New York. Feller, W. and S. Orey (1961). A renewal theorem. J. Math. Mech. 10, 619-624. Goldie, C. M. (1991). Implicit renewal theory and tails of solutions of random equations. Ann. Appl. Prob, 1, 126-166. Grey, D. R. and Z. Lu (1993). The asymptotic behaviour of extinction probability in the SmithWilkinson branching process. Adv. Appl. Prob. 25, 263-289. Jagers, P. (1975). Branching Processes with Biological Applications. Wiley, New York. Kallenberg, O. (1997). Foundations of Modern Probability. Springer, New York. Karlin, S. and H. M, Taylor (1975). A First Course in Stochastic Processes. Academic Press, New York. Kingman, J. F. C. (1972). Regenerative Phenomena. Wiley. Lalley, S. P. (1986). Renewal theorem for a class of stationary sequences. Prob. Th. Rel. Fields 72, 195-213. Lindvall, T. (1977). A probabilistic proof of Blackwell's renewal theorem. Ann. Prob. 5, 482~485. Palm, C. (1943). Intensit~itsschwankungen im Fernsprechverkehr. Ericsson Technics 44, 1-189. Pitman, J. W. (1974). Uniform rates of convergence for Markov chain transition probabilities. Z. Wahrscheinlichkeitstheorie verw. Gebiete 29, 193-227. Pollard, J. H. (1973). Mathematical Models for the Growth of Human Populations, Cambridge University Press, Cambridge. Schmidli, H. (1997). An extension to the renewal theorem and an application to risk theory. Ann. Appl. Prob. 7, 121 133. Smith, W. L. (1958). Renewal theory and its ramifications. J. Royal Statist. Soc. Series B 20, 243-302. Stone, C. (1966). On absolutely continuous distributions and renewal theory. Ann. Math. Statist. 37, 271-275. Thorisson, H. (1987). A complete coupling proof of Blackwell's renewal theorem. Stoch. Proc. Appl. 26, 87-97. Whittle, P. (1982). Opimization over Time: Dynamic Programming and Stochastic Control. Vol. 1, Wiley, New York.
"],~
The Kolmogorov Isomorphism Theorem and Extensions to Some Nonstationary Processes
Y~ichird Kakihara
1. Introduction
In this article we consider second order stochastic processes on the real line. For such a process the time domain is always defined, while the spectral domain is defined only for processes of particular classes such as (weakly) stationary processes. Here is a brief historical note. Kolmogorov (1941) defined the spectral domain of a (one dimensional) stationary process and showed that this is isomorphic to the time domain of the process, which is now called the Kolmogorov Isomorphism Theorem and is our main concern of this chapter. This isomorphism is extended to finite dimensional stationary processes by Wiener and Masani (1957, 1958) under certain conditions. The full extension was done by Rosenberg (1964). The infinite dimensional stationary case was obtained by Mandrekar and Salehi (1970). The above mentioned extension is for stationary processes. If we relax the stationarity condition and consider nonstationary processes, then we may get another type of extension of the Kolmogorov Isomorphism Theorem. Among nonstationary processes the class of weakly harmonizable processes is of interest and importance. A difficulty here is to define the spectral domain for a weakly harmonizable process. Since the covariance function of such a process is a double Fourier transform of a (positive definite) bimeasure of bounded semivariation, we need to develop a bimeasure integration theory. The original work on bimeasure integration is due to Morse and Transue (1955, 1956) in middle 1950s, which is a functional approach. A set theoretic bimeasure theory is initiated by Chang and Rao (1983). Rao (1984) applied it to define the spectral domain for a finite dimensional weakly harmonizable process and proved the isomorphism between the time and the spectral domains. Truong-van (1981) defined the spectral domain for an infinite dimensional weakly (operator) harmonizable process, where the completeness was not proved. Recently, Kakihara (2000) obtained a complete extension for the infinite dimensional harmonizable processes.
443
444
Y. Kakihara
The contents of this chapter are as follows. In Section 2, one dimensional stationary processes on the real line are considered to obtain their integral representations, where the integral is a stochastic integral with respect to (w.r.t.) an orthogonally scattered vector measure. For such a process the Kolmogorov Isomorphism Theorem (KIT) is stated, i.e., the time and spectral domains are defined and are shown to be isomorphic. In Section 3, some classes of nonstationary processes are introduced, namely harmonizable and Karhunen classes. To this end, we introduce bimeasure integration together with Dunford-Schwartz integral w.r.t, a vector measure. Basic properties of bimeasure integrals are mentioned. An R K H S (reproducing kernel Hilbert space) theory is applied to obtain integral representations for weakly harmonizable processes as well as Karhunen processes. In Section 4, the spectral domain for a weakly harmonizable process is defined to be the set of all measurable functions that are strictly integrable w.r.t, the spectral bimeasure of the process. Stationary dilations of weakly harmonizable processes are considered, which is used to show the completeness of the spectral domain of a weakly harmonizable process and also the KIT for such a process. Section 5 is devoted to an application of the Kolmogorov Isomorphism Theorem to the estimation and filtering problems. Finally in Section 6, multidimensional extention of the K I T is explored for both stationary and harmonizable processes.
2. Stationary processes and the Kolmogorov Isomorphism Theorem

To discuss second order stochastic processes, let (f2, ~,/1) be a probability measure space and R be the real line. L2(f2) = L2(sg,/~) denotes the Hilbert space of all C-valued random variables on f2 with the finite second moment, where C is the complex number field. The inner product and the norm in L2(f2) are given respectively by (x,Y)2,~ =
x(co)y(co)#(dco), [Ixll2,~ = (x,x)g+, x,y < L 2 ( ~ )
A second order stochastic process or an L2(f2)-valuedprocess on N is a mapping x(.): ~ ---+L2(~2) and is denoted by {x(t)} or simply ~. That is, for each t E ~, x(t) -x(t)(.) is a function on (2 and is in L2(y2). We confine our attention to centered processes. An L2(f2)-valued process {x(t)} is said to be centered if E[x(t)] = f a x ( t ) d # = 0 for every t E ~. H e n c e , if we let
L2(y2) = { f C L2(a): E[f] = 0} , then centered processes are mappings from N to L2(~2). Now let X = L02((2). Weak stationarity for X-valued processes was introduced by Khintchine (1934). Hereafter, weak stationarity is referred to as stationarity since we are not considering strong stationarity. An X-valued process {x(t)} is
The Kolmogorov Isomorphism Theorem and extensions to some nonstationary processes
445
said to be stationary if its covariance function 7 ( s , t ) = (x(s),x(t))2,~,s, t E depends on the difference s - t and if, letting ~ ( s - t ) = 7(s, t), ~ : E--+ C is continuous. Let us consider an X-valued stationary process {x(t)} on E with the covariance function 7 of one variable. Since 7 is continuous and positive definite in the sense that
Z c~j~kT(tj -- tlc) >_ 0 j,k

for any finite c q , . . . , ~n C C and tl, ., tn E E, we have by Bochner's theorem that 7(t) = ~eitUv(du),
t E [R
(2.1)
for a unique positive finite measure v on (R, ~), ~3 being the Borel a-algebra of R. v is called the spectral measure of the process {x(t)}. The time domain 2/P(x) of {x(t)} is defined by
= t c ,
the closed subspace of X -- L~(Y2) spanned by the set {x(t) : t C R}. For each t E E define an operator U(t) on W(x) by
v(t)4s) = + t), s R .
It is easily verified that U(t) is a linear isometry on a dense subset in ~ ( x ) and hence can be extended to a unitary operator on ~ ( x ) . Moreover, {U(t)}t~ R forms a strongly continuous group of unitary operators on ~(((x). Thus by the classical Stone theorem there is a spectral measure E(.) on (R, ~ ) such that
U(t) = ~ eituE(du),
Consequently, we have
tEE
x(t) = U(t)x(O) = ~ eituE(du)x(O),
tE
If we let ~(A) = E(A)x(O) for A C ~ , then ~ is an X-valued bounded and countably additive (c.a.) measure and is orthogonally scattered (o.s.), i.e., (~(A), ~(B))o z,,kt = 0 ifA n B = 0. In this case we write ~ c caos(fB,X). Therefore we obtained an integral representation of a stationary process: THEOREM 1. Let {x(t)} be an X-valued stationary process on R. Then, there exists a unique X-valued c.a.o.s, measure ~ E caos(N,X) such that
x(t)
= ~eitU{(du),
t<R .
(2.2)
is called the representing measure of the process.
446
Y. Kakihara
EXAMPLE 2. Assuming that dim X = ~ , let and define

1
{fn}n~_l be
an orthonormal set in X
~(A)= ~
nEAMN
nf, ,
AE23 ,
where N is the set of natural numbers. Then, clearly ~ E caos(23,X). Hence the process {x(t)} defined by (2.2) is stationary. Again assume that {x(t)) is stationary with the representing measure E caos(23,X). Then, its spectral measure v is given by v(-) = (~(.), ~('))2,#' which follows from (2.1), (2.2) and the uniqueness of v. Let
= A 23} ,
the closed subspace of X spanned by {~(A) : A C 23}. Consider the Hilbert space
L2(v) = L2(R, v), which is called the spectral domain of {x(t)}. Then, for any f E LZ(v) and A E 23, the stochastic integral fAfd~ is defined as follows. For a n simple function f = ~ j = l o:jlAs, denoted f C L(N), we define d~ = ej~(AjNA), A E 23 ,
(2.3)
where ej E C and 1Aj is the indicator function of Aj for 1 _<j _< n. By a simple computation we see that
ffd~
2,# =
H1Afl/2,v, f E t 0 ( ~ )
(2.4)
where 11"ll2,v is the norm in L2(v). For a general f E L2(v) choose a sequence {fn}~=l C L(~) such that Ilfn -fll2,,, --+ 0. Since
.~ fnd~- L fmd~ 2"#= II1A(fn -- fm)ll2,~ <_

as n, m ~ oc, we can define unambiguously
f f d~ = n-+oolim Jaffnd~ E L~(fD) = X "

This and (2.4) show that the operator P : L2(v) ~ ~ defined by
Vf = L f d~, f C L2(v)
is a unitary operator. Since it is easily seen that J4~(x) = ~ , we established an isomorphism between the time domain ~~(x) and the spectral domain L2(v), denoted X/g(x) _~ L2(v). THEOREM 3. (Kolmogorov Isomorphism Theorem) Let {x(t)} be an X-valued stationary process on ~ with the spectral measure v. Then the time domain ~,~(x)
447
and the spectral domain L2(v) are isomorphic by a unitary operator U : ~ ( x ) ---+ L2(v) given by
U x ( t ) = e i~, t ~ ~ .
3. Some classes of nonstationary processes
In many applications stationarity is too restrictive and somewhat weaker conditions are needed. In this section, we introduce harmonizable and Karhunen processes. Karhunen processes were defined by Karhunen (1947) as an extension of stationary class, while harmonizability was first introduced by Lo6ve (1948) in the middle of 1940s. Later weaker harmonizability was considered by Rozanov (1959). These two notions were distinguished by calling them weak and strong harmonizabilities by Rao (1982), where he classified other nonstationary processes such as weak and strong Cram6r classes, class (KF) and V-bounded class. In dealing with the weak harmonizable class, we need to develop bimeasure integration, originally called Morse-Transue integration, and so a detailed treatment will be given. As before let X = L02((2) and denote the norm and the inner product in X by I]"Ilx and (-, ")x, respectively. In the stationary case, the process is expressed by an integral w.r.t, an X-valued c.a.o.s, measure, and the integral was called a "stochastic integral." However, in the harmonizable case, the process has an integral representation w.r.t, an X-valued c.a. (not necessarily o.s.) measure and the integral is in the sense of Dunford-Schwartz. This will also be explained in some detail. Let (O, 92) be an abstract measurable space. Denote by ca(92,X) the set of all X-valued c.a. measures on 92. The semivariation Ilil[(') of ~ is defined by
n
[l~ll(/)
-- sup E c @ ( A j )
j=l X ~
A E 92 ,
where the supremum is taken for all finite ~1,..., ctn E C with Icgl <_ 1, 1 _< j < n and finite measurable partitions {/tl,... ,An} of A. L(O) denotes the set of all n C-valued ~.I-simple functions on O. F o r f = ~ j = l ctjlAj E L(O) the integral w.r.t. over A E 9.I is defined by (2.3) and it holds that f A f d f x <- Ilfllc~,~l[~ll(A) , where IIf]l~,~ is the ~-ess.sup norm o f f defined by Ilflloo,~ = inf{e > 0: [Ifl _> el is i-null} . Here [Ifl~>~] = { t E O : [f(t)l>_~} and a set B E 9 2 is said to be i-null if Ilill(B) = 0. i-a.e, refers to the complement of a ~-null set.
448
Y. Kakihara
A C-valued ~l-measurable function f on 6) is said to be ~-integrable if there exists a sequence {fn}n~l C L(O) of simple functions such that (a) fn --+f ~-a.e.; (b) { fA fn d~}n~ 1 is a Cauchy sequence in X for every A E ~l. In this case, the
integral of foyer A w.r.t. ~ is defined by

limf/fnd4,n~
f fd~=
which is also termed the DS-integral [cf. Dunford and Schwartz (1958)]. L 1(~) = L ~(4; C) denotes the set of all C-valued N-measurable functions on O which are 4-integrable. For f E L 1(4) let
Cs(A) --- j/A'fd~,
A E 9.I
Then, if E ca(gX,X). Note that ca(g&x) is a Banach space with the total semivariation norm II"I1(o)- Also note that for f E L 1(~), I]~fll(6)) -- 0 i f f f -- 0 ~-a.e. Hence, if we identify f , g E L 1(4) when f = ~7 ~-a.e., then
Ilflls,~--II~sll(o)
defines a norm in L 1(), which may be termed the "semivariation norm." Then we have:
THEOREM 1.
For ~ E ca(gX,x), (L 1(~), 11.Ils,~) is a Banach space and the set L(O) of all simple functions is dense in it. Moreover, the set {4f : f E L 1(4)} is a closed subspace of ca(gX,X). For the proof see Abreu and Salehi (1984) or Kakihara [1997b, p. 76].
REMARK 2. (1) If 4 E ca(9.I,X) is o.s., then LI(~) = L2(v~) and II " L,~. = Ir " 112,v~, where v(.) = (2) Suppose that E ca(gX,x) is of bounded variation, i.e.,
n
I~t(O) = s u p ~
k=I
ll4(Ak)llx < oc ,
where the sup is taken for all finite measurable partitions {A1,..., An} of O. Then I1(-) is a finite positive measure and there is an X-valued function ~' = d4/dl~l E LI(o, I~[;X) such that
since X is reflexive and hence has the Radon-Nikod~m property [cf. Diestel and Uhl (1977)]. Thus, the DS-integral reduces to the Bochner integral in this case.
449
Next we consider scalar bimeasures defined on the set 9.1 x 91 = {A x B :
A,B C ~l} of all rectangles. For any function m defined on 9.1 x 9.I we denote the value of m at A x B exchangeably by m(A x B) or m(A,B). M denotes the set of all functions m : 9.1 x 9X --+ C such that m(.,A),m(A, .) E ca(9.1, C) for every A ~ 9i, i.e., m is separately c.a., where ca(N, C) is the set of all
C-valued c.a. measures on 9.1. Each element in M is called a (scalar) bimeasure. For a bimeasure m C M the variation ( = Vitali variation) and the semivariation ( = Frech~t variation) are defined respectively by
g n
Iml(A,B) = sup Z Z [m(Aj,BN)I, A,B C 9.I,

j--1 k--1 n
IIm[[(A,B) = sup ~ Z c~j-~km(Aj,Bk) , A,B E 9.I ,

j = l k=l
where {A1,...,Ae} and { B 1 , . . . , B , } are finite measurable partitions of A and B, respectively and ~j,/~k C C with I~jl, I/~kl < 1 for 1 _<j _< g, 1 < k < n. Clearly it holds that
]ImII(A,B) < Irnl(A,B),
A,B E 93i .
If ]m](O, O) < oe, then m can be extended to a bounded c.a. measure on the a-algebra 9.1 9X generated by 9.1 x 9I. In this case, the integration of functions on O O w.r.t, m is in the sense of Lebesgue. If we only assume Ilmll(o, 0) < oc, then we need to define a weaker or more restrictive integral for functions on OxO. DEFINITION 3. Let m E M and f , g be C-valued ~I-measurable functions on O. (1) The pair 0 c, g) is said to be m-integrable if the following three conditions (a), (b) and (c) hold: (a) f is m(.,B)-integrable for B E 9.I and g is m(A, .)-integrable for A ~ ~I; (b) ml(') = fo g(t)m(., at), m2(.) = fof(s)m(ds, .) E ca(N, C); (c) f is ml-integrable and g is mz-integrable, and
fof(S)ml(ds)= fog(t)m2(dt ) .
The common value in (3.1) is denoted by
(3.1)
ioIo(f '9)dm or fofof(S)9(t)m(ds'dt)

and is called the integral of(f, 9) w.r.t.m. Let !~2(m) denote the set of all C-valued N-measurable functions f on O such that (j.,j7) is m-integrable. (2) The pair (f, g) is said to be strictly m-integrable if the condition (a) above and the conditions (b~), (c') below hold:
450
Y.Kakihara
.) E ca(91, C) for C,D C 9.I;
(b') mD(') = fD g(t)m(., dt), mC(.) = fcf(s)m(ds, (c') f is mD-integrable, g is mC-integrable and
The common value in (3.2) is denoted by
fc~*(f'g)dm
or
~ . ~ * f ( s ) g ( t ) m ( d x , dt)
and is called the integral of (f,g) w.r.t, m on C x D. E2.(m) denotes the set of all C-valued 91-measurable functions f on O such that 0r,j~) is strictly m-integrable. Clearly, ~2.(rn)C ~2(m). If f,g ~2(m)(~2.(m)), then 0C, g ) i s (strictly) m-integrable. Moreover, i f f E E2.(m), then jc, Re f , Im f and f + , f (when f is R-valued), and hence If] are in E2,(m), where Re f and Im f are real and imaginary parts o f f , f + = f V 0 and f - = - ( f A 0). This is not the case for E2(m) as an example is known [cf. Chang and Rao (1983)]. As is seen from the definition, we are considering integration only for functions of the form f(s)g(t), the product of functions of single variable. This is a restriction of bimeasure integral compared to Lebesgue integral. For ~ E ca(91,X) let
mg(A,B ) = ((A),~(B))x,
A,B E 9.1 .
Then it is easily seen that m M. Moreover, me is positive definite in the sense that
~ ~ o~j~km~(Aj,Ak)>_O
j = l k=l
for every finite set ~1,..., ~n E C a n d A i , . . . ,An E 91. Now let m C M be a positive definite bimeasure of the form m = m~ for some E ca(91,X). [In fact, every positive definite bimeasure appears in this form by the reproducing kernel Hilbert space (RKHS) theory.] The following are basic results of bimeasure integration (see Chang and Rao (1983, 1986) and Kakihara (1997b) for reference).
2 m ) and [1] Assume that m = me. I f f , g 6L1(), t h e n f , g E ~,(
fA ~*(f,~7)dm= (.~ f d~, fBgd~)x , A,B c 91.

[2] (Dominated Convergence Theorem) Assume that m = m~. Let {fn}, {gn} be sequences of C-valued 91-measurable functions on O. Suppose that there exists a pair (h, h') of strictly m-integrable functions on O such that Ifnl -< h, Ignl -< h' for each n > 1 and fn -~ f , gn --+ g pointwise. Then, (f,, ge) and (f, g) are strictly m-integrable and for A, B E 91 it holds that
The KolmogorovIsomorphism Theorem and extensions to some nonstationaryprocesses (f, g)dm = lim lim
B n---+oo g--~o~
451
Oc., ge)dm (f,,ge)dm

(3.3)
= lim lim
[3] (Bounded Convergence Theorem) Assume that m = m~_. Let {f,}, {On} be sequences of C-valued N-measurable functions on O such that Ifnl, Ignl _< ~ for some ~ > 0 and fn ---' f , gn + 0 pointwise. Then, f , g E ~2,(m) and (3.3) holds. [4] If m = mg, then LI(~) = ~2,(m). This means that two sets are identical. L 1(4) is a Banach space with the norm [Iflls,~ = II j-II(O) f o r f E LI(~), while ~2,(m) is a pre-Hilbert space with the inner product and the norm given by (f,g)m = (f,(7)dm,
Ilfllm = (f,f),~
for f , g E !~2,(m), where we identify f , g when I l f - gllm = 0. The compleness w.r.t. II"]lm will be given in a later section. Now we are in a position to define harmonizable processes. DEF~NmON 4. An X-valued process {x(t)} on ~ is said to be if its (scalar) covariance function V(s, t) is expressible as 7(s, t) = 2 ei(Su-tV)m(du'dv)' s, t E R
weakly harmonizable
(3.4)
for some positive definite bimeasure m C M, where the integral is in the sense given in Definition 1. {x(t)} is said to be strongly harmonizable if the covariance function 7 is representable as (3.4) for some positive definite bimeasure m E M of bounded variation, so that the integral in (3.4) is in the sense of Lebesgue. Let ~ E
ca(N,X) and define a process {x(t)} by

tc
x(t) = ~eit"~(du),
where the integral is in the sense of Dunford-Schwartz. Then
7(s,t) = ( . ~ eiSU~(du), f eitV~(dv)) x =
j j;*
2
ei(su-tv)rn{(du, dr) ,
by [1], and m~ is a positive definite bimeasure. Hence, {x(t)} is weakly harmonizable. Conversely, let {x(t)} be weakly harmonizable with a positive definite bimeasure m:~3 x ~3--+ C. Then we can use Aronszajn's R K H S theory (cf. Aronszajn, 1950) to obtain a Hilbert space ~m consisting of C-valued functions on ~3, which belong to ca(~, C), such that
452
Y. Kakihara
(i) (ii)
re(A, .) E ~m for every A E ~3; v(A) = (v(.), re(A, "))m for every
product in ~m.
v E 5m and A E N, where (., ")m is the inner
Let t/(A) = re(A, .) for A E ~3. It holds that t/is an 5m-valued c.a. measure on ~3, i.e., r/E ca(N, fOm), and m(A,B) = (t/(A), r/(B)) m for A,B E ~ by the reproducing property (ii). Define an S3m-valued process {y(t)} by
y(t)
= fReitUq(du)'
tE
and an operator V: 2~(y) --+ 2C(x) by

vy(t) = x ( t ) , t R
Then we see that
(Vy(s), Vy(t))x = (x(s),x(t)) x
2 el(s"
tV)m(du'dr)
= f f~iei(S"-tv)(rl(du),rl(dV))m
= ( ~ eiS"r/(du), ~ ei%/(dv)) m
= (y(s),y(t))m, s,t .
Thus V can be extended to a unitary operator from J~/C~(y)onto ~ ( x ) . Let -- V~/. Then ~ E ca(~3,X) and for t c
x(t)=Vy(t)=-V f eiturl(du)=~eitUVrl(du)= f eitU~(du) ,

since V and the DS-integral commute (cf. Dunford and Schwartz [1958, IV. 10]). Therefore we have THEOREM 5. An X-valued process {x(t)} on N is weakly harmonizable iff there exists a unique X-valued measure ~ E ca(~,X) such that
x(t) ~- ~ eitU~(du), t E ~ ,
(3.5)
where the integral is the DS-integral and ~ is called the representing measure. One quick consequence of the above is that m = m~ and {x(t)} is stationary iff is o.s. iff m is concentrated on the diagonal A = {(t, t) : t c N}. EXAMPLE 6. Let {f~}n~__l be an orthonormal set in X and define ~(A)= Z
nEACIN
l(fn + f . + l ) ,
A E23 .
453
Then, ~ E ca(23,X) and the process {x(t)} defined by (3.5) is weakly harmonizable. In this case, {x(t)} is even strongly harmonizable. DEFINITION 7. An X-valued process {x(t)} on R is said to be of a Karhunen process if its covariance function y is expressed as 7(s, t) =
Karhunen class or
fo ~o(s,O)qo(t,O)v(dO), s, t E (0,9.i,v) and some family {(p(t, .):
for some positive finite measure space
c_ L2(o,
If (O, 9X) = (E, ~3) and qo(t, .) = e it for t c E, then the above definition reduces to the stationary case. Every Karhunen process has an integral representation as stated below. T~EOREM 8. Let {x(t)} be an X-valued Karhunen process on R with a measure v E ca(N, R +) and a family {~o(t,-) : t ~ R} c_ L2(O, v) on some measurable space (O, 9.1). Then there exists an X-valued c.a.o.s, measure ~ E caos(g&X) such that = II (')ll and
x(t)
~ 0 ( t , 0)~(d0),
tE ~ .
The proof is again based on the R K H S theory. As is well-known, the following set inclusion relations hold: {stationary class} c {strongly harmonizable class} c {weakly harmonizable class} c {Karhunen class} , where the last inclusion is proved by stationary dilation (cf. Rao, 1985).
4. Weakly harmonizable processes and the Kolmogolov Isomorphism Theorem

As before we consider X = L2(~2)-valued processes on . In this section, we further study weakly harmonizable processes, especially their stationary dilations, and the Kolmogorov Isomorphism Theorem (KIT). The spectral domain is defined for a weakly harmonizable process and its completeness is given. First note that if {y(t)} is an X-valued stationary process and T : X --+ X is a bounded linear operator, then {Ty(t)} is no longer stationary, but weakly harmonizable. If, in particular, T is an orthogonal projection, then the resulting process {x(t)} = {Ty(t)} is a projection of a stationary process {y(t)} or {y(t)} is a stationary dilation of {x(t)}. A more general definition is as follows. DEFINITION 1. An X-valued process {x(t)} is said to have a stationary dilation if there exist a probability measure space (', ~', #') such that Y = Lz(f2') contains X = L2(~2) as a closed subspace and a Y-valued stationary process {y(t)} such that
454
x(t)
-
Y.Kakihara
Jy(t), t ~ e ,
where J : Y ---, X is the orthogonal projection. If {x(t)} has a stationary dilation, then necessarily it is weakly harmonizable. The converse is also true, which is proved by Niemi (1975) and is based on the orthogonally scattered dilation of an X-valued measure ~ E ca(fB,X). That is, if ~ ca(fB,X), then there exists a triple {~/, Y,J} for which Y is a Hilbert space containing X as a closed subspace and t/E caos(f8, Y) is o.s. such that 8 = Jr/, J : Y -+ X being the orthogonal projection. Thus we have:
THEOREM 2. An X-valued process on R is weakly harmonizable iff it has a
stationary dilation. Now we define spectral domains for weakly harmonizable processes. DEFINITION 3. Let {x(t)} be an X-valued weakly harmonizable process on ~ with the representing measure 8- E ca(23,X). Then the bimeasure m = m~ E M is called the spectral bimeasure of the process, while the space ~2(m) (cf. Definition 3.3) is called the spectral domain of the process. As was mentioned in the last section, ~ ( m ) is a pre-Hilbert space with the inner product and the norm given respectively by
(f,g)m =
2(f 'O)dm'
I]fllm= (f ,f)~m
for f , 9 E ~2,(m). Now we want to show that ~2,2(m) is actually a Hilbert space. The idea of proving this is due to Rao (1984). Let {~/, Y,J} or simply t/be an o.s. dilation of ~. Note that Y need not contain the whole space X, but should contain 6 4 = ~{8-(A) : A E ~3} = W(x), the time domain of {x(t)}. Suppose that {q,Y,J} is minimal in the sense that Y = ~ = ~{t/(A) :A c ~3} and that, if {t/, Y',J'} is another o.s. dilation of 8-, then Y can be regarded as a closed subspace of Y~. Now let {y(t)} be a Y-valued stationary process with the representing measure r/, which is called a minimal stationary dilation of {x(t)}. Let v~(.) = IIr/(.)l]~. Then we have the Kolmogorov Isomorphism Theorem: ~ , = Y ( y ) _ ~ L 2 ( v ~ ) , where a unitary operator g : L2(v,) --+ W(y) is given by
U(f) = f / d r ,
JR
f EL2(v~)
Observe the following diagram:

Y = ~' = ~(Y)~ c, z?(v~)
JI
X D ~ = (x)
.[Jt
U-~ >J1L2(v~)
455
where J1 is the orthogonal projection corresponding to J, which is clearly given by

J1 = U - 1 J U . We also define ~ = j l L X ( v , ) , which is a closed subspace of LX(v,).
In order to show that ~2,(m) is complete, we embed ~,2(m) into LX(v~) isomorphically and isometrically and establish ~2,(m) ~ !~, from which we conclude that ~2,(m) -~ JtQx). The outline is as follows. [The details are in Chang and Rao (1986) and Kakihara (1997b).] (1) ft E ~2,(m) = LI(~) for t E R, where ft(u) = ( J l e i t ) ( u ) , u E R. (2) V: 24P(x) --+ ~2,(m) is an isometry, where
gx(t) = e it',
tEN .
(3) V1 : ~2,(m) --+ LX(v,) is an isometry, where V1 is the extension of V0(f) = J l ( f ) , f E C0o(R) C !~,(m) ,
2
Coo(R) being the space of continuous functions on N with compact supports. (4) V1 : :~2(m) --+ L2(v,) is actually onto. Thus we get: THEOPd~M4. (KIT) Let {x(t)} be an X-valued weakly harmonizable process on R with the spectral bimeasure m. Then the time domain (x) and the spectral do2 m) are isomorphic, ~ ( x ) _~ ~ ( m ) , where the isomorphism U is given by main ~,(
g x ( t ) = e it" ,
t E R .
5. Application to estimation and filtering

Let X = L~(Q) throughout this section. First we consider an estimation problem for an additive noise model. Let {x(t)}, {y(t)} and {n(t)} be X-valued processes on N such that
40 = y(t) + n(t), t C R ,
where {x(t)} is the observation, {y(t)} the signal, and {n(t)} the noise process. Let a E R be any number and we want to estimate the signal y(a). We take the least squares estimator ~(a) of y(a) in the sense that
Ily(a)
- )3(a)llx = min{lly(a) - z l [ x : z E itS(x)} .
If P : X -+ J/t~(x) is the orthogonal projection, then we see that 29(a) = Py(a) or, equivalently,
(y(a) - fi(a),x(t)) x = O, t E N
(5.1)
by the projection lemma.
456
11.Kakihara
When the process is stationary, Grenander (1951) examined this type of problem using the spectral domain, and when the processes are weakly harmonizable, Chang and Rao (1986) treated and generalized the former results [see also Rao (1985, 1989)]. We follow Chang and Rao (1986) hereafter. Assume that {x(t)},{y(t)} and {n(t)} are weakly harmonizable with the representing measures ~, ~y and ~,, respectively. Note that {y(t)} and {n(t)} are weakly harmonizably correlated, i.e., the cross' covariance function 7~(s, t) = (y(s), n(t)) x can be written as 7~(s,t)=
fL[eil 'Vlm
(du, dv),
s, t c ~ ,
where rny~(A,B) = (~y(A), ~n(B)) x for A,B E 23. Since iF(x) ___t72,(mx) by the KIT and j)(a) E ~ ( x ) , there is a unique ha C t22,(mr) such that
~(a) = ~ ha(u)~x(du) .
(5.2)
Thus the problem here is to find the function ha in 82,(rex) and we shall find some conditions regarding ha. It follows from (5.1) and (5.2) that
( i eiaU~y(du), f eitV~x(dV))x= ( L ha(u)~x(du), L eitV~x(dV))x

and hence
./L ei(a"-W)myx(du,dv) 1 ha(u)e-itVmx(du, dv),

=2 2
tE N ,
(5.3)
where
myx(A,B) = my(A,B) + myn(A,B),

Note that for A,B E 23
A,B E 23 .
mx(A,B) = (~x(A), ~(B)) x = (~_y(A) + ~,~(A), ~y(B) + ~,(B)) x = my(A,B) + mn(A,B) + my,(A,B) + my,(A,B) ,
so that (5.3) becomes
i L [ e-it~{eiaU(my -I- mj~)(du, dv) - ha(u)(my + m, + my, + N~)(du, dr)} = 0 ,

for every t E N. It is not hard to derive that [ eia" (my + m~)(du,A) = [ ha(u)(my + m, + m~ + # ~ ) (du,A) JR JR (5.4)
The KolmogorovIsomorphism Theoremand extensions to somenonstationaryprocesses

for every A E ~3. Evaluating at A = ~ and letting
457
K(.) = (my + my~)(., R), G(.) = (my + mn + myn + m~)(., R) ,

we see that ha is a solution to the following W i e n e r - H o p f type equation
~ ha(u)G(du) = ~ eiauK(du) .
(5.5)
Consider a special case where G and K are absolutely continuous w.r.t, the Lebesgue measure dt and have RN-derivatives G ~ = (dG/dt) and K' = (dK/dt), respectively. Then, we get
ha(") = e~a"Ia'(u)]-K'(u) ,
where [G'(u)]- = G'(u) 1 if G'(u) 0 and = 0 otherwise, i.e., [G']- is the generalized inverse of G'. If this ha is in E2,(mx), then we obtained a solution to the signal extraction problem, which is summarized as follows. T~ZOI~eM 1. Consider a signal plus noise model given by
x(t) = y(t) + , ( t ) , g ,
where {x(t)}, {y(t)} and {n(t)} are LZ(~2)-valued weakly harmonizable processes on R with given spectral bimeasures rex, my and m,, respectively. Here, {x(t)} is the observation, {y(t)} the signal and {n(t)} the noise. Suppose that the cross spectral bimeasure m~ of {y(t)} and {n(t)} is also known. Then, for any a c N, the best linear least squares estimator )~(a) of y(a) based on the observation of {x(t)} is given by
fi(a) ----~ h~(u)x(du)

with the spectral characteristic ha c ~a,(mx), where ix is the representing measure of {x(t)} that is known from m~. h a is a solution to the W i e n e r - H o p f type integral equation
e'a"(my + myn)(du, ~) =
ha(u)(my +mn + myn + my,,)(du, e ) .
2 is calculated as Moreover, the expected error variance o-a % = Ily(a) - )~(a)I]~ = (du, dr) 2 ha(u)ha(v)mx(du, dr) .
2
The corresponding K a r h u n e n case may be obtained.
458
Y. Kakihara
Next we consider the filtering problem. Consider a linear operator A on X = L~(~2) that commutes with translations on R and an equation
Ax(t) = y(t),
tc ~ .
(5.6)
In this case, A is called a linear filter, {x(t)} an input, and {y(t)} an output or observation. The problem is to find {x(t)} from {y(t)}. {y(t)} can be stationary or (strongly or weakly) harmonizable. A can be a difference operator, a differential operator, an integral operator, or a combination of these, so that it need not be bounded. Bochner (1956) started to treat this type of problem where A is a integro-differential-difference operator. Chang and Rao (1986) and Rao (1989, 1992) extensively studied the case where {y(t)} is weakly harmonizable. The most general form of A is an integro-difference-differential operator given by dk Ax(t) = ZP JR ak(z) ~ x(t-k=0 *)2k(d*), tE E , (5.7)
where all derivatives are taken in the strong sense, 2k is a complex measure on (E, ~3) and ak : R --+ C is a bounded measurable function for 0 < k < p. As is seen easily, the operator A contains a difference operator and a difference-differential operator as special cases. Assume that {x(t)} is weakly harmonizable, so that
x(t) = j[ eitU~,(du),
for some ~x E
tE
ca(~3,X).
Then, intuitively we have
~x(t)d
~eitu~(du)
which is justified if fR u~(du) exists as a D u n f o r ~ S c h w a r t z integral since
x ( t + h )h- x ( t )
~eitUiu:x(du )
X =
~ e i t u (eihuh ash--*O.
iu)~(du) x
--+0 By iteration we have for k _> 1
x(k)(t) = ~i;x(t) =
(iu)keit"~x(du),
A E ~3
o
tE
if fR tulk~x(du) exists in X = L~(~). Note that for k _> 1, {x (k)(t)}, if this exists, is also weakly harmonizable since this has the representing measure ~k given by
~k(A) = ;(iu)k~.x(du), JA
459
Summarizing the above discussion, we have the following [Chang and Rao (1986)]. THEOREM 2. Let {x(t)} be an X-valued weakly harmonizable process on R with the representing measure ~x and the spectral bimeasure rex. Consider a filter A given by (5.7) and a filter equation (5.6). If f lulP~x(du) exists in X, then {y(t)} is also weakly harmonizable with the representing measure ~y and the spectral bimeasure my given respectively by
~,(A) =- f f(U)x(du), A c ~ ,
my(A,B) =
(5.8)
12"
f(u)~f(v)mx(du, dv),
A,B ~3 ,
(5.9)
where the function f is obtained as
f(u) = Z ( i u ) ~
k=0
ak(u)e-i~"2k(dr),
uER .
(5.10)
In Theorem 2 above, the function f obtained satisfies that
y(t) = Ax(t) = fR eitUf(u)~x(du)'
tER
and is called the spectral characteristic of the given filter. In view of the KIT we have the following correspondence:
2/~(x) ~ y(t) ~ ~ ht(-) = eitf(.) c P~2,(mx) .

Also in the above, if the process {x(t)} is strongly harmonizable or stationary, then the process {y(t)} is so, which is seen from (5.8) and (5.9). For a scalar measurable function f on R the generalized inverse f - is defined by f-(u) =- ( l / f (u)) if f(u) 0 and = 0 otherwise. Let Bf = {u E R: f(u) = 0} ~3. Then necessary and sufficient conditions for the existence of a harmonizable solution {x(t)} to the filter equation (5.6) for a given harmonizable observation {y(t)} are now given in terms of spectral characteristic f and Bf, which is also due to Chang and Rao (1986). THEOREM 3. Suppose that {y(t)} is an X-valued weakly harmonizable process on with the spectral bimeasure my and a filter equation (5.6) is given with A satisfying (5.7). The spectral characteristic f is then given by (5.10). Let Bf = {u c R : f ( u ) = 0}. Then, the filter equation (5.6) with (5.7) has a weakly harmonizable p-times norm differentiable process {x(t)} iff (1) ]lmyll(Bf,Bf) = 0; (2) fB~fc (If(u)[If(v)l)-imy(du, dr) < oc; (3) fe~f~ luvl~(]f(u)]lf(v)])-lm,(du, dv) < oc for 0 < k < p .
460
F. Kakihara
In this case, the solution is given by
x(t) = [
JB;
eit"f(u)-I ~y (du),
tE~
6. Multidimensional extension
So far we have studied one dimensional or univariate processes. In this section, we consider multivariate extension of second order stochastic processes and obtain similar results as in the univariate case. We begin with the finite dimensional case. Let (2, 5, #) and L2(f2) be as before and q _> 2 be an integer. Then we can consider q-dimensional second order stochastic process {x(t)} by setting 40 = (x'(t),...,xq(t))L t ~ ~ ,
where {xk(t)} is an L2(~2)-valued process for 1 < k < q and ''~'' stands for the transpose. That is, {x(t)} is an [L2(~)]q-valued process. For x = (xI,... ,xq) ~ and y = (yl,... ,yq), [L2(f2)]q the inner product and the norm are defined by
(x,Y)2,q = Z (
k=I
xk k ,Y )2,/~,
IIxll2,q=
Ilxql~,~
k=l
respectively. Moreover, [L2(~?)] q has a matrix valued or B(Cq)-valued inner product [.,-]q, called the gramian, defined by (xl yl'~ /2,#
{x,y]q = x / =
...
'..
(xl,yq)2,# I
,
(xq,yl)2,#
.''
(xq,yq)2#/
where B(C q) is the algebra of all bounded linear operators on C q with the Euclidean norm. Thus we have to consider two kinds of covariance functions for {x(t)}. The scalar covariancefunction 2 is defined by ~(s,t) = (x(s),x(t))2,q, s,t ~
as before and the matricial or operator covariance function F by r(s,t) = [x(s),x(t)]q,

,,t ~ ~
Clearly, operator covariance functions are more general than scalar ones in the theory of multidimensional processes because y(s, t) = trF(s, t), where tr(-) is the trace of the matrix. Note that [L2(f2)]q is a left B(Cq)-module with the module action (a,x)~-+a.x = ax for a C B(C q) and x c [L02(g?)] q. The vector time domain J~0(x) of {x(t)} is defined to be the closed subspace generated by the set
461
{x(t) : t E }, while the modular time domain ~(x) of {x(t)} is defined to be the closed submodule generated by {x(t) : t E R}, i.e., the closure of
akx(tk):akEB(Cq),tkER, 1 < k < n , n _ > 1
Clearly, J4%(x)C_ ~ ( x ) and we usually consider modular time domains for multidimensional processes. At first let us consider [L2(~2)]q-valued stationary processes. An [L~(g2)]q-valued process {x(t)} is said to be operator stationary if its operator covariance function F(s, t) depends only on the difference s - t and if, letting F(s - t) ~ F(s, t), each of the components of/~ is continuous on N. As in the one-dimensional case we can show that there exists a hermitian positive B(Cq)-valued c.a. measure F, denoted F c ca(~,B+(Cq)), and an [L2(f2)]q-valued c.a. gramian orthogonally scattered (g.o.s.) measure 4, denoted ~ E cagos(~3, [L~(f2)]q), such that
~ eit"F(du),
t ~ ~,
I
x(t) = ~ eitU~(du), t E N
If we write ~ = (~1,... ,~q)~, then it is easily shown that ~,kE caos(N,L~(~2)), 1 < k < q and they are biorthogonal, i.e., (~J(A), ~k(B))2,~ = 0 for any disjoint A,B E ~ and 1 _<j,k <_q because of gramian orthogonal scatteredness of 4, i.e., [~(A), ~(B)]q = 0 for disjoint A,B E N. Thus, each component {xk(t)}, 1 < k < q of an operator stationary process {x(t)} is an L2(f2)-valued stationary process with the representing measure ~k. The modular spectral domain LZ(F) of an operator stationary process {x(t)} is defined to be the set of B(Cq)-valued functions q~ on R such that the pair (~, (b) is F-integrable. Here, a pair (~, ~) of B(Cq)-valued functions is F-integrable if the B(Cq)-valued function ebF'~* is integrable w.r.t, v(.) =- IFI(.), the variation, where F 1 = dF/dv, i.e., each component of ~bU7~* is in L1(R, v). Then the KIT holds for such a process: If {x(t)} is an [L~(f2)]q- valued operator stationary process on R with the operator spectral measure F, then the modular time domain ~f(x) and the modular spectral domain L2(F) are isomorphic, denoted ~f(x) ~ L2(F), where an isomorphism U: 24f(x) --+ L2(F) is given by
u (t)
t c
I being the identity matrix valued function, and satisfies that
U(a~b + b~) = aU~ + bUtP,

u% = e,
a, b E B(cq), ~b, 7j E L2(F),

L (F) ,
where [4), (P]F = fn cI)F'TJ*dv" Similarly an [L02(f2)]q-valued process {x(t)} is said to be weakly or strongly harmonizable if each component {xk(t)}, 1 < k < q is an L~(O)-valued weakly or strongly harmonizable process, respectively. If
462
Y. Kakihara
~k E ca(~,LZ(f2)) is the representing measure of {Xk(t)} for 1 < k < q, then, letting ~ = (~1,..., ~q)~ E ca(~3, [L~(f2)]q), we have x(t) = [ eitU~(du),
dR
t E R, (6.1) s, t E N ,
F(s, t) =
ei('u-tV)M(du, dr),
where M(A,B) = M~(A,B) = [~(A), ~(B)]q for A , B E ~ and the second integral is in the sense of componentwise strict integral. We can say that an [L2(f~)]q-valued process {x(t)} is weakly harmonizable iff its operator covariance function F has an integral representation w.r.t, a B(Cq)-valued positive definite bimeasure M of bounded semivariation given by (6.1). Since every LoZ(f2)-valued weakly harmonizable process has a stationary dilation, for each k = 1 , . . . , q there exists an L2-space L2(12k) on some probability measure space ((2k, ~k,~k) containing L2(f2) as a closed subspace, and an L2(f2k)-valued stationary process {yk(t)} with the representing measure tlk E caos(N,L2(f2k)) such that xk(t) = PkJ~(t), t E ~, where Pk : L20(Ok) ---+L~(f2) is the orthogonal projection. Choose a probability measure space (O, ~,/~) such that L2(0) may contain L2(f2k) as a closed subspace for k = 1 , . . . , q and consider an [Lo2(f))]q-valued process {y(t)} defined by y(t) -- @1 (t),... ,y q (t)) , t ~ ~. Then we see that {y(t)} is operator stationary with the representing measure rl = (rll,..., rlq) ~ E cagos(~3, [L2(0)] q) and satisfies that x(t) = Py(t), t E ~, where P = @~=l/Sk : [Lo2(f))]q --+ [L2((2)]q is the gramian orthogonal projection,/sk being the extension of Pk to L2(g)). Let M = (mij), so that each mij is a scalar bimeasure. The modular spectral domain ~.2(M) is defined to be the set of all B(Cq)-valued functions = ((p~j) on such that each (p~j is strictly rnke-integrable for 1 _< k, g < n. The KIT in this case is as follows: If {x(t)} is an [L02(f2)]q-valued weakly harmonizable process with the spectral bimeasure M, then the modular time domain ~ ( x ) and the 2 modular spectral domain ~.(M)2 are isomorphic, denoted Yf(x) -~ !~.2(M), where the isomorphism U: H(x) --+ !~.(M) is given by
Ux(t) = eit'I(.),
t E ;~ .
Using the KIT the finite dimensional version of Theorems 2 and 3 in Section 5 were stated and proved in Chang and Rao (1986) in a more general setting. In a similar way, we can define q-dimensional Karhunen processes in a componentwise manner and obtain the KIT and the related results. We now consider the case where the vector valued process has infinitely many components. For q E N (the set of natural numbers) observe the identification [/;o2(0)]~ = Lo~(O;C ~) , the latter being the Hilbert space of all Cq-valued random variables x on f2 with zero expectation such that f~llx(co)H2q#(doa)< 00, where II'llcq is the usual Euclidean norm in C q. If q = 00, we want to consider the Hilbert space Lo2(f2;g2)
463
of all g2-valued (strong) random variables x on g~ with zero expectation such that f~ ILx(co)[122#(dco) < oc, where 11" lie2 is the norm in the sequence Hilbert space g2. Hence, letting H be an arbitrary (infinite dimensional) Hilbert space, we can consider the Hilbert space Lz(~2;H) and an L2(O;H)-valued process {x(t)} as an infinite dimensional second order stochastic process. In this section, we use a letter X for L02(O; H). So let X = L 2 (~2;H) and study its structure. To do this we need properties of Hilbert-Schmidt class and trace class operators on a Hilbert space, which are found in Schatten (1960). The inner product (., ")x and the norm [1' ]Ix are respectively defined by
(x,Y)x =
(x(oo),y(m))m#(doo),
I]Xl[x= (x,x)~x
for x,y E X, where (., ')H is the inner product in H. The gramian [., "Ix in X is defined by
[x,y~x = L x ( c o ) y(m)#(dco)
(6.2)
for x,y E X, where is the tensor product in the sense of Schatten (1960), that is, (q~ ~)~b' = (q~', O)Hq~ for qS, 0, ~' ~ H. The operator defined by (6.2) is as follows:
([x,Y]x(a, O)H = ((x(co) y(o)))~, O)H#(dco)

= (x(co), t/,)H (qS,y(co))H#(dco) for ~b, ~ C H. Then [x,y]x is well-defined and gives an element of T(H), the set of all trace class operators on H. It can also be verified that
tr[x,Y]x = (x,Y)x,
IlXllx =
II{x,xlxll!
for x,y e X, where tr(.) is the trace and 1[. lit is the trace norm. If x =f~b and Y = gO, where f , 9 E L2(~2) and q~, ~ E H, then we see that
[x,Y]x = If0, gO]x = 0 c, g)a,pq5 ~

Thus we obtained a T(H)-valued gramian [., "]x on X. Clearly, one has that (i) (ii) (iii) (iv) Ix, X]x > O, and Ix, X]x = 0 iff x = 0; Ix +y,z]x = [x,Z]x + [y,z]x; [ax,Y]x = a[x,Y]x; Ix, Y]x = [Y, x]x,
for x,y,z c X and a C B(H), the algebra of all bounded linear operators on H, where ..... denotes the adjoint. Since X is a left B(H)-module under the module action (a, x)~-~a x = ax for x E X and a E B(H), and is a Hilbert space with a T(H)-valued gramian, we call it a normal Hilbert B(H)-module.
464
Y. Kakihara
From now on we assume that H is a separable complex Hilbert space. X : L02(O;H)-valued operator stationary processes are defined similarly as in the case of [LZ(Q)]q-valued processes. Thus, if {x(t)) is an X-valued operator stationary process with the operator covariance function F, then we have
x(t) = .~ eit"~(du),
t E R,
F(s, t) = F(s - t) = ~ ei(S-t)UF(du),
s, t E N
for some X-valued c.a.g.o.s, measure ~, denoted ~ ~ cagos(N,X), and some T+(H)-valued measure F, denoted f E ca(f6, T+(H)) with F(A NB) = [~(A), ~(B)I x for A,B ~ ~. In this case, the modular time domain//f(x) of {x(t)} is the closed submodule generated by the set {x(t):t E ~}. To define the modular spectral domain we proceed as follows. First note that F has a bounded variation v(.) = IIF(.)II~ = tr(F(.)) and it holds that
F(A) = fA F'(u)v(du),
A ~
where U = (dF/dv) is the Radon-Nikod~m derivative since T(H) is known to have the Radon-Nikod~?m property Let O(H) denote the set of all (not necessarily bounded) linear operators on H. An O(H)-valued function on ~ is said to be f~-measurable if an H-valued function q~(.)~b is strongly measurable for every ~ ~ H. A pair (~, 7+) of O(H)-valued N-measurable functions is said to be t_l I , , 2 F-integrable ff @F and 7+F1 2 are Hllbert-Schmldt class operator valued functions 1 1 . . and (@F/~) (7+F'~)* is a T(H)-valued v-Bochner mtegrable function on ~, denoted (@F')(7+F')* E LI(R, v; T(H)). In this case, we write
[+, 7+IF=
+dF 7+* =
~bF'~)(7+F'~)*dv .
Define an L2-space L2(F) by LZ(F) = {~ : ~ ---+O(H), ~3-measurable such that (~b,~b) is F-integrable}. As is easily seen, [-, 'IF is a gramian on LZ(F), where the norm and the inner product are defined respectively by
II+/IF --
I1[+,
+]F[I!,
(+, 7+)f = tr[<b, 7+IF,
<b, 7+ ~ L2(F) .
The space L 2(F) is called the modular spectral domain of the operator stationary process {x(t)}. The KIT is stated as follows: Let {x(t)} be an X-valued operator stationary process on ~ with the operator spectral measure F. Then, the modular time domain gg(x) and the modular spectral domain L2(F) are isomorphic, denoted ~ ( x ) ~ L2(F), where the ispomorphism U : 2C(x) -+ L2(F) is given by
Ux(t) = eitI(-),
t~ ~ .
465
Now we want to define X-valued harmonizable processes. Let {x(t)} be an X-valued process with the operator covariance function F and suppose that F has an integral representation
F(s, t) = / ~ =
ei(s"-t~)M(du, dr),
s, t E N
(6.3)
for some T(H)-valued positive definite bimeasure M on ~3 x ~3. If M is of bounded semivariation, which is equivalent to sup{ [[M(A,B)[[~ :A, B C ~3} < ec, then the integral in (6.3) is a well-defined vector bimeasure integral developed by Ylinen (1978). In this case we can prove by an analog of R K H S theory for vector valued positive definite kernels that there exists a ~, E ca(f~,X) such that
x(t) =f~eitU~(du),
tE ~ .
Hence {x(t)} may be termed "weakly harmonizable," though it does not have an operator stationary dilation in general. Thus we need a stronger notion of harmonizability. In order to integrate operator valued functions with respect to a T(H)-valued bimeasure M or an X-valued measure ~ we introduce the following. The operator semivariation IlMllo(., .) of a T(H)-valued bimeasure M is defined by
m //
M o A,B = sup
aiM
k ,
"C
A,B ~ ~ ,
where the supremum is taken over all finite measurable partitions {A1,...,Am} of A and {BI,... ,Bn} of B, and aj, bk C B(H) with Ilajll, Ilbkll _< 1 for 1 _<j < m and 1 < k < n. If the positive definite M in (6.3) is of bounded operator semivariation, i.e., IIMIIo(R, ~)< oc, then the representing measure ~ of {x(t)} is also of bounded operator semivariation, denoted d ~ bca(23,X). Here the operator semivariation [l~llo(') of ~ is defined by
n ak~(Ak) [[~]]o(A) = sup ~_l
X ~
A E~ ,
the supremum being taken over all finite measurable partitions {A1,. .. ,An} of A and a~ c B(H) with Ila~ll _< 1 for 1 < k < n. If this is the case, {x(t)} is said to be weakly operator harmonizable [cf. Kakihara (1985, 1986)] and we can show that {x(t)} has an operator stationary dilation, i.e., there exists a normal Hilbert B(H)-module Y = L~(~;H) containing X as a closed submodule and a Y-valued operator stationary process {y(t)} such that x(t) = Py(t), t E R, where P : Y --, X is the (gramian) orthogonal projection [cf. Kakihara (1992, 1997a)]. To obtain a KIT for a weakly operator harmonizable process, we need to define operator bimeasure integral. First, consider a T(H)-valued measure F of bounded variation, denoted F E vca(~, T(H)), and let v(-) = Ill (.), the variation.
466
Y. Kakihara
An O(H)-valued N-measurable function ~b is said to be F-integrable if ~bf' ~ LI(~, v; T(H)), where U = (dF/dv). We denote
fA q~dF= fA ~bF'dv, A E N
An important property of L 1(F) is the following, where L ~(F) is the set of all O(H)-valued N-measurable F-integrable functions on ~.
THEOREM
1. If F E vca(N, T(H)), then L 1(F) is a Banach space with norm 11.[I1,F
defined by
II~II1,F = [l~F ' IIl,v= f~ II~F'llcdv,

functions is dense in L 1(F).
LI(F)
where v(.)= tel('). Moreover, the set L(R;B(H)) of B(H)-valued N-simple

A proof to the above is found in Kakihara (2000). Now let M be a T(H)-valued bimeasure of bounded operator semivariation. For each A E N it is seen that M(A, .),M(.,A) E vca(N, T(H)), and hence we can consider D(M(.,A)). Let (~, T) be a pair of O(H)-valued N-measurable functions. Then, (~, 7j) is said to be strictly M-integrable if (a) ~, ~ E LI(M(.,A)) for A E N; (b) MD(.) -- fD ~(t)M(., dt)*,M2C(.) ~ fc q~(s)M(ds, .) E vca(N, T(H)) for
C,D N;
(c) c LI((M{))*), ~ ~ D((Mc) *) for C,D E N and it holds that f c ~(s)(MD)*(ds) = (~D~(t)(MC)*(dt)) * The common value in (c) is denoted by fc fD q~dMT*, called the integral of(~, T) w.r.t. M over C x D. If (~, T) satisfies (a), and (b), (c) only for C = D = R, then (~, T) is said to be M-integrable. When M is positive definite, ~2,(M) denotes the set of all O(H)-valued N-measurable functions on ~ such that (~, ~) is strictly 2 M-integrable. For ~, ~ E !~, (M) we set [~,Tt]M = 2 ~dMTt*, (~,~P)M=tr[~,~]M, [I~IIM----II[~,~]MI~,
SO that [., "]M is a T(H)-valued gramian in ~2(M). Then we have the KIT for a weakly operator harmonizable process, where a proof is given in Kakihara (2000). THEOREM 2. (KIT) Let {x(t)} be an X-valued weakly operator harmonizable process on ~ with the spectral operator bimeasure M. Then the modular time 2 M ), denoted domain ~ ( x ) is isomorphic to the modular spectral domain ~,( ~ ( x ) -~ !e2,(M), where the isomorphism U : 2/~(x) ~ ~,2(M) is given by
467
Ux(t) = eitI(.),
tE~ .
The estimation problem for infinite dimensional case can be stated as follows. Consider an additive noise model:
40 = y(t) + ,,(t), t ~ ~ ,
where {x(t)}, {y(t)} and {n(t)} are X-valued weakly operator harmonizable processes with representing measures ix, ~y and G E bca(~3,X), respectively. As in Section 5, take any a ~ ~ and estimate the signal y(a) using the least squares estimator ~(a): [[y(a)-)~(a)J[x = min{[]y(a) - z l [ x : Z e fi(x)} , or, equivalently,
~v(a) - ~(a),x(t)] x = O,
tE ~ .
(6.4)
By the KIT above and ~(a) E fi(x), there is a unique ~a E 9,2(Mx) such that
f~(a) = ~ q~a(U)~x(du)
where Mr = M~. Then, letting
I((.) = My(., ~) + M~(., ~), ~(.) = (My +M~ + M ~ + M~)(., e ) ,

where Mn =M,, a n d M ~ ( d , B ) = [ G ( d ) , ~ ( B ) ] x for A,B C ~3, we see that ~ is a solution to the following Wiener Hopf type equation:
~ ~a(u)G(du)= f
eiauK(du) ,
(6.5)
which follows from (6.4) and corresponds to (5.4) and (5.5). Note that K, G E vca(~Y, T(H)). If IKI(.), I~l(.) << dt, the Lebesgue measure, then we have that K' = dK/dt, (7' = d G / d t E L 1(R, dt; T(H)), and hence by (6.5)
~a(U)C(U) = eia~K'(u) .
(6.6)
The generalized inverse a- of a bounded linear operator a c B(H) is defined by
a- = J~t(~)~a-td~
where 91(a) is the null space of a, 9t(a) is the range of a, J~jt is the orthogonal projection onto a closed subspace 9Jr, and a 1 is the multivalued inverse relation of a [cf. Hestenes (1961)]. If we assume that G'G ~- is closed, it follows from (6.6) that
gaaJ~~7 ~ = eia'K' G ~- .
468
Y. Kakihara
Now let ~g~ = e i a U C -, which is an O(H)-valued ~3-measurable function. Then it holds that ~a GI = eiaKI GI-G t = eia'KtJgt(G,) C eiaKI , provided 9t(U) C 91(GI). In this case, I//a is almost a solution to (6.6) and to (6.5). Finally, a filtering problem for the infinite dimensional case should be mentioned. Consider a filter equation Ax(t) = y(t), tE ~ (6.7)
with a linear filter A given by
Ax(t) = Z / ak(t) ~-~x(t - "C)2k(dz),

k=0 dR
dk
t E R ,
(6.8)
where all derivatives are taken in the strong sense, 2k is a complex measure on (N, ~3) and ak : R --, B(H) is a bounded ~3-measurable function for 0 < k < p. With the same idea we can establish an infinite dimensional version of Theorem 2 in Section 5 as follows: Let {x(t)} be an X-valued weakly operator harmonizable process on R with the representing measure ix and operator spectral bimeasure Mx. Consider a filter A given by (6.8) and a filter equation (6.7). I f .f~ lulP~x(du) exists in I5, then {y(t)} is also weakly operator harmonizable with the representing measure ~y and the operator spectral bimeasure My given respectively by ~y(A) = ~ ~b(u)~x(du), My(A,B) = A E ~3 A,B E ~ ,
IS
~(u)Mx(du, dv)~(v)*,
where the function ~b is obtained as (u) = Z ( i u ) k k=0 ak(u)e-i~")~k(dr), UE ~ .
Acknowledgement
The author is grateful to Professor M. M. Rao for his valuable suggestions.
References
Abreu, J. L. and H. Salehi (1984). Schauder basic measures in Banach and Hilbert spaces. Bol. Soc. Mat. Mexicana 29, 71-84. Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68, 337404.
469
Bochner, S. (1956). Stationarity, boundedness, almost periodicity of random-valued functions. In Proc. Third Berkeley Syrup. Math. Statist. & Prob. (Ed., J. Neyman), pp. 7-27. Vol. 2, University of California Press, Berkeley and Los Angeles. Chang, D. K. and M. M. Rao (1983). Bimeasures and sampling theorems for weakly harmonizable processes. Stochastic Anal. Appl. 1, 21-55. Chang, D. K. and M. M. Rao (1986). Bimeasures and nonstationary processes. In Real and Stochastic Analysis (Ed., M. M. Rao), pp. %118. John Wiley & Sons, New York. Diestel, J. and J. J. Uhl, Jr (1977). Vector Measures. Amer. Math. Soc., Providence, R. I. Dunford, N. and J. T. Schwartz (1958). Linear Operators, Part L Interscience, New York. Grenander, U. (1951). Stochastic processes and statistical inference. Ark. Mat. 1, 195--277. Hestenes, M. R. (1961). Relative self-adjoint operators in Hilbert space. Pacific J. Math. 11, 1315-1357. Kakihara, Y. (1985). A note on harmonizable and V-bounded processes. & Multivar. Anal. 16, 140-156. Kakihara, Y. (1986). Strongly and weakly harmonizable stochastic processes of H-valued random variables. J. Muhivar. Anal. 18, 127 137. Kakihara, Y. (1992). A classification of vector harmonizable processes. Stochastic Anal. Appl. 10, 27%311. Kakihara, Y. (1997a). Dilations of Hilbert-Schmidt class operator valued measures and applications. In Stochastic Processes and Functional Analysis in celebration of M. M. Rao's 65th birthday (Eds., J. Goldstein, N. E. Gretsky and J. J. Uhl, Jr), pp. 123-135. Marcel Dekker, New York. Kakihara, Y. (1997b). Multidimensional Second Order Stochastic Processes. World Scientific, Singapore. Kakihara, Y. (2000). Spectral domains of vector harmonizable processes. J. Statistical Planning and Inference, to appear. Karhunen, K. (1947). fJber lineare Methoden in der Warscheinlichkeitsrechung. Ann. Acad. Sci. Fenn. Ser. A. L Math. 37, 1-79. Khintchine, A. Ya. (1934). Korrelationstheorie der station/ire Prozesse. Math. Ann. 109, 605-615. Kolmogorov, A. N. (1941). Stationary sequences in Hilbert space. Bull. Moskov. Gos. Univ. Matematika 2, 1-41. Lo6ve, M, (1948). Fonctions al~atoires du second ordre. Appendix to P. Lbvy, Processus Stochastiques et Mouvement Brownien, Gauthier-Villas, Paris, pp. 299-352. Mandrekar, V. and H. Salehi (1970). The square-integrability of operator-valued functions with respect to a nonnegative operator-valued measure and the Kolmogorov isomorphism theorem. Indiana Univ. Math. J. 20, 543-565. Morse, M. and W. Transue (1955). C-bimeasures A and their superior integrals A*. Rend. Circolo Mat. Palermo 4, 270-300. Morse, M. and W. Transue (1956). C-bimeasures A and their integral extensions. Ann. Math. 64, 480-504. Niemi, H. (1975). On stationary dilations and the linear prediction of certain stochastic processes. Soc. Sei. Fenn. Comment. Phy.-Math. 45, 111-130. Rao, M. M. (1982). Harmonizable processes: Structure theory. L'Enseign. Math. 28, 295 351. Rao, M. M. (1984). The spectral domain of multivariate harmonizable processes. Proe. Natl. Acad. Sei. USA 81, 4611-4612. Rao, M. M. (1985). Harmonizable, Crambr, and Karhunen classes of processes. In Handbook of Statistics, (Eds., E. J. Hannan, P. R. Krishnaiah and M, M. Rao), Vol. 5, pp. 279-310. NorthHolland, Amsterdam. Rao, M. M. (1989). Harmonizable signal extraction, filtering and sampling. In Topics in Non-Gaussian Signal Processing (Eds., E. J. Wegman, S. C. Schwartz and J. B. Thomas), pp. 98-117. SpringerVerlag, Berlin. Rao, M. M. (1992). L2,2-boundedness, harmonizability, and filtering. Stochastic Anal. Appl. 10, 323-342. Rosenberg, M. (1964). The square-integrability of matrix-valued functions with respect to a nonnegative Hermitian measure. Duke Math. J. 31, 291-298.
470
Y. Kakihara
Rozanov, Yn. A. (1959). Spectral analysis of abstract functions. Theory Prob. Appl. 4, 271-287. Schatten, R. (1960). Norm Ideals of Completely Continuous Operators. Springer, New York. Truong-van, B. (1981). Une g+n6ralisation du th6or6me de Kolmogorov-Aronszajn processus V-born~s q-dimensionnels: domaine spectral dilatations stationnaires. Ann. Inst. Henri Poineard Sec. B17, 31~49. Wiener, N. and P. Masani (1957). The prediction theory of multivariate stochastic processes, Part I. Aeta Math. 98, 111-150. Wiener, N. and P. Masani (1958). The prediction theory of multivariate stochastic processes, Part II. Acta Math. 99, 93-137. Ylinen, K. (1978). On vector bimeasures. Ann. Mat. Pura Appl. 117, 115-138.
"[
Jt ,...,,
Stochastic Processes in Reliability
Masaaki Kijima, Haijun Li and Moshe Shaked
Almost every theoretical development in the area of stochastic processes is applied, sooner or later, in reliability theory. The purpose of this chapter is to describe some relatively recent such applications. We consider some first-passage times that often model lifetimes of reliability systems, and we describe some recent results which identify various aging properties, of importance in reliability theory, of these first-passage times. We point out many results involving stochastic comparisons of diverse maintenance policies. Some theory of stochastic comparisons of point processes is described. Various replacement policies are defined and commented on. We give a description of several applications of the theoretical results for the purpose of obtaining stochastic comparisons of processes that count the number of planned and unplanned replacements. We describe some applications of the above mentioned theoretical results for the purpose of obtaining stochastic comparisons of point processes that are associated with some general repair models. We also describe some of the theory of random (cumulative) hazard and random hazard rate functions. Such random functions are, in fact, stochastic processes, and we describe some computational results and some stochastic comparison results that are associated with these processes. An important particular application of random hazard and random hazard rate functions is in modeling lifetimes of items which function in a random environment. Some of the theory of this area is also described in this chapter.
1. Introduction
Almost every theoretical development in the area of stochastic processes is applied, sooner or later, in reliability theory. The purpose of this chapter is to describe some relatively recent such applications. In Section 2 we consider some first-passage times that often model lifetimes of reliability systems. We describe there some recent results which identify various aging properties, of importance in reliability theory, of these first-passage times. In Section 3 we point out many results involving stochastic comparisons of diverse maintenance policies. First some theory of stochastic comparisons of
471
472
M. Kijima, H. Li and M. Shaked
point processes is described. Then various replacement policies are defined and commented on. The crux of Section 3 is a description of several applications of the theoretical results for the purpose of obtaining stochastic comparisons of processes that count the number of planned and unplanned replacements. In Section 4 we describe some applications of the above mentioned theoretical results for the purpose of obtaining stochastic comparisons of point processes that are associated with some general repair models. In the first part of Section 5 we describe some of the theory of random (cumulative) hazard and random hazard rate functions. Such random functions are, in fact, stochastic processes, and we describe some computational results and some stochastic comparison results that are associated with these processes. An important particular application of random (cumulative) hazard and random hazard rate functions is in modeling lifetimes of items which function in a random environment. Some of the theory of this area is described in the second part of Section 5. In this chapter, 'increasing' and 'decreasing' stand for 'non-decreasing' and 'non-increasing,' respectively.
2. Aging first-passage times

First-passage times of appropriate stochastic processes have often been used to represent times to failure of devices or systems subjected to shocks and wear, times of their random repairs, and times of random interruptions of their operations. Therefore the aging properties of such first-passage times have been widely investigated in the reliability and maintenance literature. In this section we review some basic results regarding first-passage times that have some aging properties. The aging properties that we will discuss are related to each other as follows (the definitions of these notions are given later in the sequel):
PF 2 ~ IFR ==~ DMRL :=~ NBUE
IFRA
==~
NBU
:=~
NBUC
Most of the material in this section can be found in Li and Shaked (1997). Some terminology and notations that are used throughout this section are described below. Let {Xn, n > 0} be a discrete-time stochastic process with state space I0, ec). We assume that the process starts at 0, that is, we assume that P{Xo = 0} = 1. For every z > 0 we denote by T~ the first time that the process crosses the threshold z, that is, Tz =- inf{n > 0 : Xn >_z} (Tz = oc ifX~ < z for all n > 0). If, for example, Tz is New Better than Used (NBU) [Increasing Failure Rate 0FR), Increasing Failure Rate Average (WRA), etc.] for any z _> 0, then the process {Xn, n > 0} is called an NBU [IVV,, ~FRA, etc.] process. In a similar manner one defines N~U [XFI~, WRA, etc.] continuous-time processes. In this section we point out many instances of NBU processes, ~FRA processes, IFR processes, and other processes that have first-passage times with similar aging properties. We do not
Stochasticprocesses in reliability
473
consider here anti-aging properties such as Nwv (New Worse than Used), DVg (Decreasing Failure Rate), DFRA (Decreasing Failure Rate Average), etc. Some results about anti-aging properties of first-passage times can be found in the mentioned references. 2.1. Markov processes Consider a discrete-time Markov process {Xn, n _> 0} with the discrete state space ~J+ = {0, 1,...}. Denote by P = {pij}ic~+j~+ the transition matrix of the process. Keilson, in a pioneering work, obtained many distributional properties, such as complete monotonicity, for various first-passage times of such processes [see, for example, his book: Keilson (1979)]. We cannot give here all the results and the details of his work. The strongest aging property that we consider here is the notion of log-concavity. A non-negative random variable is said to have the Polya Frequency of order 2 (eF2) property if its (discrete or continuous) probability density is log-concave. Assaf, Shaked and Shanthikumar (1985) have shown the following result. THEORBM 2.1. (PF2 first-passage times) If the transition matrix P is totally positive of order 2 (TP2) [that is, pijpi,f ~ Pi,jpij, whenever i < iI and j _<f ] , then Tz has a log-concave density for all z > 0; that is, {Xn, n _> 0} is a ev2 process. Assaf, Shaked and Shanthikumar (1985) also extended this result to some continuous-time Markov processes with discrete state space. Shaked and Shanthikumar (1988) have extended this result to continuous-time pure jump Markov processes with continuous state space. An aging notion that is weaker than the eF2 property is the notion of ~FR. A discrete non-negative random variable T is said to have this property if P{T > n} is log-concave on ~+, or, equivalently, if its discrete hazard rate function, defined by P { T = n}/e{r >_n}, is increasing on N+. If P is the transition matrix of the discrete-time Markov process {Am,n _> 0}, then let Q denote the matrix of left partial sums of P. That is, the Uth element of Q, denoted by qij, is defined by qij = ~ = 0 Pi~. Durham, Lynch and Padgett (1990) essentially proved the following result. TH~ORE~ 2.2. (IFR first-passage times) If Q is Te2 then Tz is IFR for all z > 0; that is, {Xn, n _> 0} is an Ira process. This result strengthens previous results of Esary, Marshall and Proschan (1973) and of Brown and Chaganty (1983). Using ideas of the latter, this result can be extended to some continuous-time Markov processes with discrete state space. Shaked and Shanthikumar (1988) have extended this result to continuoustime pure jump Markov processes with continuous state space. Some related results are given in Shanthikumar (1988), in Kijima (1989a), and in Lee and Lynch (1997). One application of Theorem 2.2 is given in Kijima and Nakagawa (1991). They considered a Markov process {Xn,n > 0} defined by Xn = bX~_l +D,,, n >_ 1
474
(X0 - 0), where 0 < b < 1, and the D,'s are independent non-negative random variables. Such processes arise in studies of imperfect preventive maintenance policies, where each maintenance action reduces the current damage by 100(1 - b)%, and D~ is the total damage incurred between the (n - 1)st and the nth maintenance action. Let Gn denote the distribution function of D~. Kijima and Nakagawa (1991) showed that if G~(x) is TP2 in n and x, and if Gn(x) is logconcave in x for all n, then {X~, n > 0} is an IFR process. Since the IFR property is weaker than the PF2 property, one would expect that a condition, that is weaker than the assumption that P is TP2, would suffice to guarantee that {X,,, n > 0} is an IFR process. Indeed, the assumption that Q is Tl'2 is weaker than the assumption that P is TP2. Sometimes the first time, that the increment of a Markov process is larger than a certain critical value, is also of interest. Thus, Li and Shaked (1995) studied firstpassage times of the form ~,~ ~ inf{n >_ 1 : X~ >_ z or X~ - X~-I > u} [Tz,u = oc if X, < z and X, - X n _ l >0} is said to have a convex transition kernel if P{X,+I > x+yIX~ = x} is increasing in x for all y. The process {X,, n > 0} is said to have increasing sample paths if almost every realization {x,, n >_ 0} of {X,, n > 0} is increasing in n. Li and Shaked (1995) have shown the following result. THEOREM 2.3. (IFR incremental first-passage times) If {X~, n > 0} has increasing sample paths, and a convex transition kernel, and if P is TP2, then ~,, is ~VR for all z and u. Li and Shaked (1995) also obtained a version of this result for some continuous-time Markov processes with discrete state space. An aging notion that is weaker than the WR property is the notion of IFRA. A discrete non-negative random variable T is said to have this property if either P { T = 0} = 1, o r P{T = 0} = 0 and [P{T > n}] 1/" is decreasing in n > 1. The process {32,, n _> 0} is said to be stochastically monotone if P{Xn+I > x[Xn = y} is increasing in y for every x. Equivalently, {X~, n >_ 0} is said to be stochastically monotone if qij is decreasing in i for all j, where Q is the matrix of the left partial sums of P. From a general result of Shaked and Shanthikumar (1987) we obtain the following result. THEOREM 2.4. (IFRA first-passage times) If {X~,n > 0} is stochastically monotone, and if it has increasing sample paths, then T~ is IFRA for all z _> 0; that is, {Xn, n >_ 0} is an IFRA process. This result strengthens previous results of Esary, Marshall and Proschan (1973) and of Brown and Chaganty (1983). Using ideas of the latter, this result can be extended to some continuous-time Markov processes with discrete state space. Drosen (1986) and Shaked and Shanthikumar (1988) have extended this result to continuous-time pure jump Markov processes with continuous state space. Ozekici and Gfinltik (1992) have used this result in order to show that some interesting Markov processes that arise in the study of some maintenance policies are IFRA.
Stochastic processes in reliability
475
Since the IFRA property is weaker than the wg property, one would expect that a condition, that is weaker than the assumption that Q is wp2, would suffice to guarantee that {X,,n >_ 0} is an WRA process. Indeed, the assumption that {Am,n _> 0} is stochastically monotone is weaker than the assumption that Q is wP2. However, for the rFRA result we need to assume that {X,, n _> 0} has increasing sample paths, whereas there is no such an assumption in Theorem 2.2. When the first time, that the increment of a Markov process is larger than a certain critical value, is of interest, then one studies T~,u. The process {X,, n > 0} is said to have increasing convex sample paths if almost every realization {xn, n _> 0} of {Xn, n > 0} is increasing and convex in n. Li and Shaked (1995) have shown the following result. THEOREM 2.5. (IFRA incremental first-passage t i m e s ) I f { X , , n > 0 } has increasing convex sample paths, and a convex transition kernel, then ~,, is IVRA for all z and u. Li and Shaked (1995) also obtained a version of this result for some continuous-time Markov processes with discrete state space. An aging notion that is weaker than the IFRA property is the notion of NBU. A discrete non-negative random variable T is said to have this property if P{T >_ n} > P{T - m > niT > m} for all n > 0 and m 2 0. Brown and Chaganty (1983) proved the following result. T~EOREW 2.6. (NBU first-passage times) If {X,I, n _> 0} is stochastically monotone then T~ is NBU for all z > 0, that is, {Xn, n _> 0} is an NBU process. This result strengthens previous results of Esary, Marshall and Proschan (1973). Brown and Chaganty (1983) extended this result to some continuous-time Markov processes with discrete state space. Since the NBU property is weaker than the WgA property, it is not surprising that the condition, that suffices to imply that {X,,n > 0} is N~u, is weaker than the conditions that imply that {X,, n _> 0} is IVRA. Marshall and Shaked (1983) have identified a condition that is different than stochastic monotonicity, and that still yields that {AT,,,n _> 0} is an NBU process. More explicitly, Marshall and Shaked (1983) showed that if the strong Markov process {X,, n >_ 0} starts at 0, and is free of positive jumps (that is, it can jump up only at most one unit at a time), then it is an NBV process. They also obtained a similar result for continuous-time strong Markov processes with discrete or continuous state space. For example, a Wiener process, which starts at 0, is an NBU process. Also, if {0(1 (t),X2(t),... ,Xm(t)), t >_ 0} is a Brownian motion in N~, and Y(t) =_ [2i~l Xi2(t)l 1/2, then the Bessel process {Y(t), t > 0} is N~U. Li and Shaked (1995) studied the N~U property of T~,u, the first time that the increment of a Markov process is larger than a certain critical value. They have shown the following result. THEOREM 2.7. (NBU incremental first-passage times) If {X~, n > 0} has a convex transition kernel, then T~,~is NBU for all z and u.
476
Li and Shaked (1995) also obtained a version of this result for some continuous-time Markov processes with discrete state space. An aging notion that is weaker than the NBU property is the notion of NBUC (New Better than Used in the Convex ordering). A non-negative random variable T is said to have this property if E[q~(T)] > E[~(T - m)lT >>m] for all m > 0 and for all increasing convex functions ~b for which the expectations above exist. If P is the transition matrix of {X,, n >_ 0} then the potential matrix of {X~, n > 0}, which we denote by R, is defined by R - ~ , ~ 0 W, provided it is defined (this is the case, for example, when {Xn, n > 0} has strictly increasing sample paths). Let 1~ denote the matrix of left partial sums of R; that is, if rij denotes the ijth element of R and ?ij denotes the ijth element of fl then rij = ~ = 0 rik. Also, define the matrix Rm ~ ~nQ=mpn, which exists if R exists, and let Rm denote the matrix of left partial sums of Rm; that is, if rm,u denotes the ijth element of Rm and Pm,Udenotes the ijth element of fire then ?m,ij -- ~J=0 rm,ik" P4rez-Oc6n and Gfimiz-P6rez (1996) have shown the following result. THEOREM 2.8. (NBUC first-passage times) If rm,ij is decreasing in i for all j and m, then T~ is NBUC for all z _> 0; that is, {X,,n >_ 0} is an NBUC process. In addition to the condition given in Theorem 2.8, Phrez-Oc6n and GfimizP6rez (1996) have also assumed that {Xn, n > 0} has increasing sample paths. But, if for a fixed z, one modifies {Xn, n _> 0} so that z is an absorbing state, then it is seen from the proof of P6rez-Oc6n and G/tmiz-P6rez (1996) that the almost sure monotonicity of the sample paths is not required for the conclusion that {X~,n >>_0} is NBUC; note that for the modified process the potential matrix is defined for the submatrix of the transition matrix corresponding to states 0, 1 , . . . , z - 1. P4rez-Oc6n and G/tmiz-P4rez (1996) have also extended this result to some continuous-time Markov processes with discrete state space. Since the NBUC property is weaker than the NBU property, one would expect that a condition, that is weaker than stochastic monotonicity, should suffice to imply that {Am,n > 0} is NBUC. Indeed, it can be shown that if {X,, n > 0} is stochastically monotone then Fm,ijis decreasing in i for all j and m. An aging notion that is weaker than the NBUC property is the notion of NBUE (New Better than Used in Expectation). A non-negative random variable T is said to have this property if E[T] > E[T - mlT >_ m] for all m >_ 0. Karasu and Ozekici (1989) obtained the following result. THEOREM 2.9. (NBUE first-passage times) If ?ij is decreasing in i for all j then T~ is NBtJE for all z _> 0; that is, {Xn, n _> 0} is an NBUE process. In addition to the condition given in Theorem 2.9, Karasu and Ozekici (1989) have also assumed that {Xn, n > 0} has increasing sample paths. But, again, if for a fixed z, one modifies {Xn, n > 0} so that z is an absorbing state, then it is seen from the proof of Karasu and Ozekici (1989) that the almost sure monotonicity of the sample paths is not required for the conclusion that {X~,n >_ 0} is NBUE. Here, again, the potential matrix is as defined after Theorem 2.8. Note that the
477
definition of NBUE processes of Karasu and Ozekici (1989) differs from our definition in the sense that we only require that T~ is NBUE for all z on {X0 = 0} whereas Karasu and Ozekici (1989) require that T~ is NBUEfor all z on {X0 = i} for each i; their definition makes sense because they assume increasing sample paths. Karasu and (~zekici (1989) have also extended Theorem 2.9 to some continuoustime Markov processes with discrete state space. Since the NBUE property is weaker than the NBUC property, it is not surprising that the condition, that suffices to imply that {X~, n >_ 0} is NBUE, is weaker than the condition that implies that {X,, n > 0} is NBUC. We close this subsection with the consideration of an aging notion that is weaker than the IFR notion, but is stronger than the NBUE notion. A non-negative random variable T is said to have the DMRL (Decreasing Mean Residual Life) property ifE[T - mlT > m] is decreasing in m > 0. If P is the transition matrix of the discrete-time Markov process {Am,n > 0}, then let Q~ denote the matrix of left partial sums of pm. Denote the ijth element of Qm by q,~,ij. P&ez-Ocdn and G/tmiz-P&ez (1996) have shown the following result. THEOREM 2.10. (DMRL first-passage times) If rm,ij/qm,ij is decreasing in i for all j and m, then T~ is DMRL for all z > 0; that is, { , , , n _> 0} is a DMRL process. In addition to the condition given in Theorem 2.10, P6rez-Ocdn and GfimizPhrez (1996) have also assumed that {X,,, n _> 0} has increasing sample paths. But, again, if for a fixed z, one modifies {X~, n > 0} so that z is an absorbing state, then it is seen from the proof of P6rez-Ocdn and Gfimiz-P&ez (1996) that the almost sure monotonicity of the sample paths is not required for the conclusion that {X~,n > 0} is DMRL, Once more, the potential matrix here is as defined after Theorem 2.8. P6rez-Oc6n and G/tmiz-P&ez (1996) have also extended this result to some continuous-time Markov processes with discrete state space.
2.2. Cumulative damage processes

Suppose that an item is subjected to shocks occurring randomly in (continuous) time according to a counting process {N(t), t > 0}. Suppose that the ith shock causes a non-negative random damage X~. Suppose that damages accumulate additively. Thus, the damage accumulated by the item at time t is Y(t) ~ y~Ji(~) Xi. We assume that the damage at time 0 is 0. The process {Y(t), t > 0} is called a cumulative damage shock process. Suppose that the item fails when the cumulative damage exceeds a fixed threshold z. If {N(t), t > 0} is a Poisson process, and if the damages X/s are independent and identically distributed, and are independent of {N(t),t > 0}, then {Y(t),t >_ 0} is a Markov process. The results described in Subsection 2.1 can then be used to derive aging properties of the first-passage time Tz. For example, the process {Y(t), t _> 0} clearly has increasing sample paths, and is stochastically monotone. Therefore it is IFRA. Esary, Marshall and Proschan (1973) noticed that if the X~'s are not necessarily identically distributed, but are merely stochastically increasing (that is, P{X~ > x} is increasing in i for all x), then the process
478
{Y(t), t > 0} is still IFRA. In fact, Esary, Marshall and Proschan (1973) have identified even weaker conditions on the Xis that ensure that { Y(t), t >_0} is IFRA. As another example of the application of the results of Subsection 2.1 to the process {i@), t > 0}, suppose that the damages X~s are independent and identically distributed with a common log-concave distribution function, then the process {Y(t), t > 0} is wR. In fact, Esary, Marshall and Proschan (1973) and Shaked and Shanthikumar (1988) have identified even weaker conditions on the X/s that ensure that {Y(t), t >_ 0} is ~Fp.. If {N(t),t>_ 0} is a nonhomogeneous (rather than homogeneous) Poisson process then the process {Y(t), t > 0} is not necessarily a Markov process, even if the damages X~s are independent and identically distributed, and are independent of {N(t), t >_ 0}. Let A(t), t >_O, be the mean function of the nonhomogeneous Poisson process {N(t),t >_0}. From results of Abdel-Hameed and Proschan (1973) it follows that the W~A results, mentioned in the previous paragraph, still hold provided A is star-shaped. It also follows from their paper that the ~FR results, mentioned in the previous paragraph, still hold provided A is convex. Sumita and Shanthikumar (1985) have studied a cumulative damage wear process in which {N(t), t _> 0} is a general renewal process. Let the interarrivals of {N(t), t > 0} be denoted by U,., i > 1. Thus, (Ui,Xi), i > 1, are independent and identically distributed pairs of non-negative random variables. In this model it is not assumed that, for each i, the random variables Us and X~ are necessarily independent. In fact, this model is of particular interest when, for each i, the random variables Ui and X/ are not independent. If we define here Y(t) =--~N=(t 1) Xi, then it is clear that, in general, {Y(t), t _> 0} is not a Markov process. Sumita and Shanthikumar (1985) showed that if the U,-s are NBU, and if the pairs (Ui,Xi) possess some positive dependence properties, then {Y(t), t > 0} is an NBU process. They also showed that if the Uis are NBUE, and if the pairs (Ui,Xi) possess some other positive dependence properties, then {Y(t), t > 0} is an NBUE process. Furthermore, they also showed that if the Uis are HNBUE (Harmonic New Better than Used in Expectation) [that is, ~ o P{Ui > x } d x < /~ e x p ( - t / # ) for t > 0, where # = E[Ui]], and if the X/s are exponential random variables, and if the pairs (Ui,Xi) possess some positive dependence properties, then {Y(t),t > 0} is an HNBUE process [see also P6rez-Oc6n and Ggtmiz-P6rez (1995)]. Sumita and Shanthikumar (1985) also obtained similar results for the model in which the nth interarrival depends on the (n - 1)st jump Xn-1 (rather than on the nth jump). In another paper, Shanthikumar and Sumita (1984) considered a wear process that, at time t, is equal to max0<n<N(t){X~}. Again, they did not assume that, for each i, the random variables Ui and X/are independent. They identified conditions under which this process is NBU, NBUE or HNBUE. In the preceding paragraphs it has been assumed that threshold z is fixed. But in many applications it is reasonable to allow it to be random (see Section 5 below). In that case we denote the threshold by Z. We also denote then the firstpassage time to Z by Tz. Esary, Marshall and Proschan (1973) have shown that if {N(t), t >_ 0} is a Poisson process, and if the damages X~s are independent and identically distributed, and are independent of {N(t),t > 0}, and if Z is IVP, A
479
[respectively, NBU] then Tz is IFRA [respectively, NBU]. They have also shown that if the identically distributed random damages X/s have the PF2 property, and if Z has the PV2 property, then Tz has the PF2 property. Abdel-Hameed and Proschan (1973) considered a random threshold cumulative damage model in which the damages Xis are still independent, but are not necessarily identically distributed, and the counting process {N(t), t _> 0} is a nonhomogeneous Poisson process with mean function A. In fact, Abdel-Hameed and Proschan (1973) assumed that the ith random damage has the gamma distribution with shape parameter a i and rate parameter b. Denote Ak --- ~k= 1 ai, k _> 1. Abdel-Hameed and Proschan (1973) showed that if Ak is convex (respectively, star-shaped, superadditive) in k, and if A is convex (respectively, starshaped, superadditive), and if Z is IFR (respectively, WRA, NBU), then Tz is 1FR (respectively, ~FRA, NBU). For the special case when all the ais are equal, they showed that if A is convex (star-shaped), and if Z is DMRL (NBUE) then Tz is DMRL (NBUE). For this special case, Klefsj6 (1981) showed that if A is star-shaped, and if Z is HNBUE, then Tz is UNBVE. Abdel-Hameed (1984a, b) and Drosen (1986) have extended some of these results to pure jump wear processes. The reader is referred to a review on wear and damage processes, by Shaked (1984), in which he/she can find more details than we can give here on cumulative damage processes. The reader can also find there some further references of works dealing with cumulative damage processes. 2.3. Non-Markovian processes In many applications in reliability theory, the underlying wear process is nonMarkovian. Various researchers have tried to obtain aging properties of firstpassage times for some non-Markovian processes. We describe below some fruits of their efforts. First we mention a closure property of IFgA processes due to Ross (1979), and a closure property of y~tJ processes due to Marshall and Shaked (1986). Ross (1979) essentially showed the following result. THEOREM 2.11. (IFRA closure property) If {Xi(t),t > 0}, i = 1 , 2 , . . . , n , are independent ~VRA processes, each with increasing sample paths, then {~b(Xl(t), X2(t),... ,Xn(t)),t > 0} is also an ~FRA process whenever q5 is continuous and componentwise increasing. From results of E1-Neweihi, Proschan and Sethuraman (1978) and of Marshall and Shaked (1986) the following result follows. THEOREM 2.12. (NBU closure property) If {Xi(t),t > 0}, i = 1 , 2 , . . . , n , are independent NBu processes, each with increasing sample paths, then {qS(X1 (t),X2(t),... ,Xn(t)), t _> 0} is also an y~u process whenever q~ is continuous and componentwise increasing. More general results are described in Subsection 2.4 below.
480
Marshall and Shaked (1983) and Shanthikumar (1984) have identified a host of non-Markovian processes that are NBU. We cannot give here all the technical details, but we will try to describe some of these processes in plain words. One kind Of NBU wear processes that Marshall and Shaked (1983) have identified is the following kind of processes with shocks and recovery: Shocks occur according to a renewal process with NBU interarrivals. Each shock causes the wear to experience a random jump, where the jumps are independent and identically distributed, and are independent of the underlying renewal process. These jumps may be negative as long as the wear stays non-negative. Between shocks the wear changes in some deterministic manner which depends on the previous history of the process. This deterministic change may correspond to a partial recovery of the underlying device. A second kind of NBU wear processes that Marshall and Shaked (1983) have identified is the following kind of processes with random repair times: The process starts at 0, and before the first shock it increases in some deterministic manner. Shocks occur according to a Poisson process. Each shock causes the wear to experience a random jump (usually a negative jump, but the process is set equal to 0 if such a jump would carry it below 0), where the jumps are independent and identically distributed, and are independent of the underlying Poisson process. Between shocks the wear increases in some deterministic manner where the rate of increase depends only on the current height of the process. This deterministic change may correspond to a continuous wear of the underlying device, and the jumps correspond to repairs that reduce the wear. A third kind of ~BU wear processes that Marshall and Shaked (1983) have identified is the following kind of Gaver-Miller (1962) processes: These processes have continuous sample paths that alternately increase and decrease in a deterministic fashion where the rate of increase or decrease depends only on the current height of the process. The random durations of increase are independent and identically distributed exponential random variables, and the random durations of decrease are independent and identically distributed ~rBu random variables. Shanthikumar (1984) has generalized the first two kinds of processes mentioned above. In particular, he allowed the times between jumps and the magnitude of jumps to be dependent, and he still was able to prove, under some conditions, that the resulting wear processes are NBU. His results also extend the NBU results of Sumita and Shanthikumar (1985). Lain (1992) has gone even further and identified a class of stochastic processes that are even more general than those of Shanthikumar (1984). She showed that the processes in that class are NBtJE. Marshall and Shaked (1986) extended the NBU results that are described above to processes with state space [0, oc) m. Semi-Markov processes are stochastic processes that are more general than Markov processes in the sense that the sojourn time of the process in each state has a general distribution rather than being exponential. Using coupling arguments, Shanthikumar (1984) was able to formulate a set of conditions under
Stochasticprocessesin reliability
48l
which semi-Markov processes are NBu. Lain (1992) obtained conditions under which semi-Markov processes are NBUE. For some more details on the aging properties of first-passage times of nonMarkovian processes, and for further references, see the review by Shaked (1984).
2.4. Processes with state space ~m

Let {X(t),t > 0} = {(Xl(t),Xz(t),...,Xm(t)),t>_ 0} be a stochastic process on N~ _= [0, oc) m. A set U _c N~ is called an upper set if (x~,x2,... ,Xm) E U and (y~,y2,... ,Ym) >- (Xl,X2,... ,Xm) imply that ~Vl,y%... ,ym) E U. The first-passage time of the process {X(t),t_> 0} to an upper set U is defined by T u z i n f { t _ > 0 : X ( t ) c U } [ T u = c ~ if X ( t ) ~ U for all t_>0]. The process {X(t), t > 0} is called an IFRA [NBU] process if Tu is I F ~ [NBU] for all closed upper sets U c_ N~_. Clearly, every component {X/(t), t _> 0} of an IFRA [NBU] process {X(t), t > 0} is an WRA [NBtJ] process on N+. In this subsection we consider only processes that start at 0, the origin of R+. The following characterizations of ~FRA and NBU processes are taken from Shaked and Shanthikumar (1991) and from Marshall and Shaked (1986). THEOREM 2.13. (IFRA and NBU characterizations) (i) The process {X(t), t > 0}, with increasing sample paths, is IFRA if, and only if, for every choice of closed upper sets Us, U2,..., Un, the random variables Tv~, Tu2,..., TG satisfy that r(Tvl, Tu2,..., TG) is taRA for every coherent life function ~. [For a definition of coherent life functions see, for example, Esary and Marshall (1970) or Barlow and Proschan (1975).] (ii) The process {X(t), t > 0}, with increasing sample paths, is NBU if, and only if, for every choice of closed upper sets Us, U2,..., Un, the random variables TUI, Tu2,..., TG satisfy that r(Tv~, Tu2,..., Tun) is N~U for every coherent life function z. The following IFRA closure properties can be derived from results in Marshall (1994). THEOREM 2.14. (general I F R A closure properties) (i) Let {Xi(t), t > 0} be an WRA process on N+' with increasing sample paths, i = 1 , 2 , . . . , n. If these n processes are independent then {(Xl(t),X2(t),..., Xn(t)), t -> 0} is an ~VRAprocess on ~ +i=1 mi (ii) Let {X(t), t > 0} be an IFRA process on N~_~ and let 0 : ~[~1 ~ ~mc2 be an increasing continuous function such that 0(0) = 0. Then {0(X(t)), t _> 0} is an IFRA process on R~ 2. The following N~U closure properties can be derived from results in Marshall and Shaked (1986).
482
T~EOREM 2.15. (general NBU closure properties) (i) Let {Xi(t), t _> 0} be an NBU process on ~+~ with increasing sample paths, i--= 1 , 2 , . . . , n. If these n processes are independent then {(Xl(t),X2(t),..., ]~'iLl mi X,,(t)), t _> 0} is an NBU process on ~+. (ii) Let {X(t), t _> 0} be an NBU process on R+ ~ and let O : R~ ~ --+ ~_2 be an increasing continuous function such that 0(0) = 0. Then {0(X(t)), t _> 0} is an NBU process on R~2. A discrete-time Markov process {Xn,n > 0} with state space Rm is said to be stochastically monotone if P{Xn+I E U ] X n = x } is increasing in x = (Xl,X2,... ,Xm) for all upper sets U C R m. Brown and Chaganty (1983) have shown that if such a stochastically monotone process, with state space R~, starts at 0, then it is an NBU process. They also showed that some continuous-time Markov processes are YnV. Brown and Chaganty (1983) and Shaked and Shanthikumar (1987) have shown that if such a stochastically monotone process, with state space R~_, starts at 0 and has increasing sample paths, then it is an IFRA process. They also showed that some continuous-time Markov processes are IFRA. In fact, Marshall (1994), Marshall and Shaked (1986), Brown and Chaganty (1983) and Shaked and Shanthikumar (1987) have considered processes with state spaces that are much more general than R+. To see an application of the I~RA results of Brown and Chaganty (1983) and of Shaked and Shanthikumar (1987), consider the following model of Ross (1981). Suppose that shocks hit an item according to a nonhomogeneous Poisson process {N(t), t > 0} with mean function A. The ith shock inflicts a non-negative random damage X/. The X/s are assumed to be independent and identically distributed, and are also assumed to be independent of the underlying nonhomogeneous Poisson process. Suppose that there is a function D such that the total damage after n shocks is D(X1,)(2,... ,X~, O, O, 0,...), where D is a non-negative function whose domain is {(Xl,X2,...),Xi >_ O,i > 1}. Define Y(t) ~ D ( X 1 , X % . . . , XN(t), O, O, 0,...), t >>_O. If A(t)/t is increasing in t > 0, i f D is increasing in each of its arguments, and if D(xl,x2,... ,xn, O,O,O,...) is permutation symmetric in Xl,X2,... ,x,, for all n, then {Y(t),t >_ 0} is an IFRA process. This result can be obtained from the results of Brown and Chaganty (1983) and of Shaked and Shanthikumar (1987). A function ~b : ~+--+ R+ is said to be subhomogeneous if c@(x) _< ~b(~x) for all c~ E [0, 1] and all x. Note that every coherent life function z is an increasing subhomogeneous function (in fact, ~ is homogeneous; that is, ez(x) = z(ax) for all C [0, 1] and all x). A vector ($1, $ 2 , . . . , S~) of non-negative random variables is said to be MIFRA (Multivariate Increasing Failure Rate Average), in the sense of Block and Savits (1980), if (b(S1,S2,...,S~) is IFRA for any increasing subhomogeneous function q5 [see Marshall and Shaked (1982) for this interpretation of the MIFRA property of Block and Savits (1980)]. In a similar manner Marshall and Shaked (1982) have defined the notion of MNBU (Multivariate New Better than Used). According to Block and Savits (1981), a stochastic process {X(t), t > 0} on 0~+ is said to be a MIFRA process if, for every finite collection of
483
closed upper sets UI, U2,..., Un in N+, the vector (Tv,, Tcz~,..., Tv,) is MIFRA. Clearly, every MIFRA process is an ~VRAprocess. Block and Savits (1981) have shown that there exist IFRA processes that are not MIFRA. In a similar manner one can define MNBUprocesses. Clearly, every M~FRAprocess is also an MNmJ process. The reader may find it of interest to compare the definition of MIFRA and MNBU processes to the characterizations given in Theorem 2.13. Some multivariate cumulative damage wear processes that are MIFRA will be described now. Consider m items that are subjected to shocks that occur according to (one) Poisson process {N(t), t _> 0}. Let X~?. denote the damage inflicted by the ith shock on the jth item, i > t, j = 1 , 2 , . . . , m . Suppose that the vectors Xi : (X/1,Xi2,... ,X/m), i __> 1, are independent. Assume that the damages accumulate additively. Thus, the wear process {Y(t),t_> 0} = {(Yl(t), Y2(t),..., Y m ( t ) ) , t > O } here has state space N+, where Y/(t)=~N_(1)X/j, t_>0, j = 1 , 2 , . . . , m. Savits and Shaked (1981) have proved the following two results. THEOREM 2.16. (IFRA shock m o d e l ) I f X i ~ s t X i + 1 (that is, E[O(Xi)] _< E[0(X/+I)] for all increasing functions 0 for which the expectations are well defined), i _> 1, then {Y(t), t > 0} is an ~FRAprocess. Thus, if the jth item is associated with a fixed threshold z/(that is, the item fails once the accumulated damage of item j crosses the threshold zy), j = 1 , 2 , . . . , m, then, by Theorem 2.13, the vector of the lifetimes of the items (Tz~,T~2,..., T~m) satisfies that ~ ( T z I , T Z 2 , . , TZm) is IFRA for every coherent life function r. THEOREM 2.17. (MIFRA shock model) If Xi =st Xi+l (that is, the Xis are identically distributed) then {Y(t), t _> 0} is a MIFRA process. Thus, the vector of the lifetimes of the items (Tzl, Tz2,..., Tz,,) satisfies that ~b(T~, T~2,... , Tzm ) is IFnA for every increasing subhomogeneous function ~b. Shaked and Shanthikumar (1991) have extended the above results to processes with state spaces that are more general than L2 m. They also showed that these results still hold if the shocks occur according to a birth process with increasing birth rates (rather that a homogeneous Poisson process). Marshall and Shaked (1979) obtained some multivariate NBtJ properties for the model described above. Some additional details regarding multivariate IFRA and NBU processes can be found in the reviews by Shaked (1984) and by Shaked and Shanthikumar (1986a); see also Lee and Lynch (1997).
3. Comparison of replacement policies via point processes

Many interesting results which deal with comparison of replacement policies have been obtained in the literature; see, for example, Langberg (1988), Sumita and Shanthikumar (1988), Shaked and Shanthikumar (1989), Block, Langberg and Savits (1990a, b, 1993), Shaked and Zhu (1992), and Block and Savits (1994). These results built on previous results in this area, summarized in the book of Barlow and
484
Proschan (1975). The results are sometimes intimately connected with some problems in related areas, for example, with Brown's (1980) conjecture in renewal theory [see Shaked and Zhu (1992)]. There is a tendency in the recent literature to compare replacement policies by means of counting processes, and to provide comparisons over whole periods of time, and not only at fixed time points (marginally). In this section we point out that many results, involving stochastic comparisons of diverse maintenance policies, can be explained by the utilization of the theory of point processes. Most of the material in this section can be found in Shaked and Szekli (1995). A utilization of the theory of point processes, for the purpose of stochastically comparing repairable systems, can be found in Last and Szekli (1998); we give some details of that paper in Section 4 below. 3.1. Stochastic orders of point processes The simplest description of a point process N = {Nt, t > 0} on ~+ is perhaps by a sequence of random variables 0 = To < T1 < --. on a probability space ((2, ~ , P ) . We assume that Tn -+ ec, as n ---+oc, that is, that the process is non-explosive. Each realization of {Tn} can be treated then as a measure on R+:
n>O
where fit denotes the atomic measure concentrated at t, that is, 6~(B)= 1 if t E B, and 6(B) = 0 otherwise, for all bounded Borel sets B. Measures of this type are integer-valued, in other words, a point process can be viewed as a random measure, that is, as a random element of X (the space of integervalued measures). This definition of a point process provides us with a 'global' description of the point process. An order on Y , that will be used in the sequel, is defined by
_<y v if _ ,
for all bounded Borel sets B, where #, v E rig'. When we restrict our attention to the sets B = [0, t], t > 0, then we will consider
N, : N ( [ 0 , 4 ) :
n>0
t>_ 0 ,
where I denotes the indicator function, and we see that {Nt, t _> 0} has right-continuous trajectories. More formally, {Nt, t > 0} is a random element of the space (the space of right-continuous functions with left-hand limits). We call {Nt, t > 0} a counting process since the process {Nt, t >_ 0} counts consecutive jumps of the process with the passage of time t. Thus {Nt, t _> 0} gives us a 'time dynamic' view of the point process. An order on ~ , that will be used in the sequel, is defined by f_<~9 where f , g c 9 . iff(t)_<g(t), t > O ,
485
A 'local' description of a point process is given by the sequence of interpoint distances X~ = Tn - T~_I, n _> 1. The sequence {X~} is a random element in R+. For x = (xl,x2,., .) and y = 0;1,y2,...), such that x , y E R+, we will consider the following order x_<o~y if xi <_Yi, i > l .
We now introduce some concepts of stochastic comparisons for point processes. Depending on which of the descriptions of point processes, given above, we adopt, we obtain different kinds of stochastic comparisons. (i) Suppose that N : ~2 --+ ~ and N ~ : ~2 -+ ~/~ are two point processes treated as random measures. Define N _<st-S N' if E(N) < E(N') ,
for all _<j/,-increasing real functions 4' on JV for which the expectations exist. (ii) Suppose that N = {Nt, t > 0}, and N' = {N't, t _> 0} are counting processes. Define
N _(st-~ N'
if EO({Nt, t >_ 0}) _< EO({N;, t > 0)) ,
for all <~-increasing real functions ~ on ~ for which the expectations exist. (iii) Suppose that N and N ~ are two point processes with the interpoint distances X = (T~ - To, T2 - T1,...) and X' = (T( - T~, T~ - T(,...), respectively. Define N _<st-~ N' if Ef(X') _< E l ( X ) ,
for all _<~-increasing real functions f on N~ for which the expectations exist. The above introduced stochastic orders can also be characterized by means of some finite-dimensional vectors as follows (for the definition of the usual stochastic order _<st among N~-valued random vectors, see Theorem 2.16). (a) N _<st-X N' if, and only if, ( N ( B 1 ) , N ( B 2 ) , . . . ,N(B~)) ~st (Nt(B1),Nt(B2), ... , NI(Bn)) for all bounded Borel sets B 1 , B 2 , . . . ,Bn, n > 1. (b) N _<st_~N' if, and only if, (A(,~,Nt2,..., Nt,,) _<st (N;11, N;'2,'", N[o) for all tl < t 2 < . . . < t ~ , n > _ l . (c) N <_st-~ N' if, and only if, (X[,X~,... ,X'~) ~st (X1,X'2,... , ~ ) for all n > 1. The above orders retain their full power for applications through the following 'almost sure comparison' properties.
(A) N _<st-2/ N r if, and only if, there exist realizations of N and N r which are
and N t, say, on a common probability space, such that P ( N ( B ) < N'(B), B ~ ~ ) = 1 ( ~ denotes the Borel o--field).
486
(B) N ~st-~ N' if, and only if, there exist realizations of N and N' which are N and
N', say, on a common probability space, such that
P(Nt<_N;, t > O ) = l
(C) N _<st-o~N' if, and only if, there exist realizations of N and N' which are
and N~, say, on a common probability space, such that the corresponding interpoint distances {X/} and {~.~} satisfy P(~->X/, i>1)=1 .
The almost sure property in case (A) means that N is a thinning of N~ almost surely; in case (B) it means that N~ has almost surely earlier and more numerous points than/V before each time point t; in case (C) the corresponding interpoint distances are shorter for N~ than for N almost surely. From this it is immediate that
N <_st-s/ N' ~
and that N ~st-oe N ~ ~
N <-st-~V" N ' ~
N <_st-N N ~,
N ~st-~ N ~
Examples in the literature show that, in general, N _<st-o~N'=s& N _<st-S m' and
N <-st-oc N'.
3.2. Stochastic comparisons of renewal and nonhomogeneous Poisson processes

We describe now some stochastic comparison results involving renewal and nonhomogeneous Poisson processes. A renewal process models the basic replacement policy in which an item is replaced by an 'identical' one (or is perfectly repaired) at each failure. Let {Nt = ~>_o I{r~,<_t},t> O} be a non-delayed renewal process; that is, Xn = Tn - T~_l, n > 1, (To = 0) is a sequence of independent and identically distributed non-negative random variables. We assume that the corresponding renewal distribution function F of the Xn's is absolutely continuous with a left-continuous density f such that f ( 0 ) < e~, and F(t) < 1 for all t _> 0. The following result identifies pairs of non-delayed renewal processes which are ordered according to ~st-@ and _<st-o~. THEOREM 3.1. (non-delayed renewal processes comparison) Consider two nondelayed renewal processes N = {Nt, t _> 0} and N' = {N[, t _> 0} with renewal distribution functions F and U , respectively. The following conditions are equivalent. (i) F' <_stF (that is, F'(t) >_F(t) for all t),
(ii) N <_st-~ N ~,
(iii) N _<st-o~N ~. In order to obtain N <_st-S N ', which, for non-delayed renewal processes, is a stronger result than N _<st-o~N', one can use the following theorem.
487
THEOREM 3.2. (non-delayed renewal processes comparison) Consider two nondelayed renewal processes N and N' with renewal distribution functions F and U , respectively. Let 2 and 2' denote the hazard rate functions corresponding to F and F , respectively. If 2(t) _< 2'(s) for all 0 < s < t then
N ~st-Jv N' .
F o r any two distribution functions F and F ' denote U --<hrF if 2(t) _< 2'(t) for all t > 0. The condition 2 ( t ) < 2'(s) for all 0 < s < t is fulfilled if in addition to F ' --<hrF we have that F is DFR (that is, 2(t) is decreasing in t) or that F ' is
DFR.
In some situations a comparison of a non-delayed renewal process with a delayed one (which has the same interrenewal distribution), or a comparison of two delayed renewal processes with different delays (but with the same interrenewal distribution), are of interest. We have the following results. THEOREM 3.3. (delayed renewal processes comparison) Consider two delayed renewal processes N d and N d', with the corresponding delay distributions F a and F d' and with the same interrenewal distribution after the delay. The following statements are equivalent.
(i) F d' ~st Fd, (ii) N d _<~t-~N d', (iii) N d _<st-~ N d'.
THEOREM 3.4. (delayed renewal processes comparison) Consider two delayed renewal processes N a and N d', with the corresponding delay distributions F d (with hazard rate function 2 a) and F a' and with the same interrenewal distribution F (with hazard rate function 2) after the delay. I f F d _<hr F d and 2d(t) _< 2(s) for all 0 < s < t then
N d <st-JV Nd'
The c o n d i t i o n )J(t) ~ )L(s) for all 0 < s < t is fulfilled, for example, i f F --<hrFd and F is DFR, or if F ~hr Fd and F d is DFR. As a special case of Theorem 3.4 we obtain: THEOREM 3.5. (delayed and non-delayed renewal processes comparison) Let N be a non-delayed renewal process and let N d be a delayed one (with delay distribution F d with hazard rate function 2 a) with the same interrenewal distribution F (with hazard rate function 2). I f )fl(t) <_ 2(s) for all 0 < s < t then N d <st-Jv N . Again, the condition 2d(t) < 2(s) for all 0 < s < t is fulfilled, for example, if F --<hrF a and F is DER, or if F _<hr F a and F d is PER. A nonhomogeneous Poisson process models the maintenance policy in which an item is minimally repaired at each failure. Let N = {Nt, t > 0} be a non-
488
homogeneous Poisson process with mean function A (that is, A(t) = E(Nt), t > 0). Assuming that A is differentiable, the internal intensity function of {Nt, t ___0} is given by
= d A(t), t > 0 .
Also denote A -1 (p) = inf{x : A(x) > p}. THEOREM 3.6. (nonhomogeneous Poisson processes comparison) Let N and N ~ be nonhomogeneous Poisson processes with mean functions A and A ~, and internal intensity functions 2 and 2t, respectively. (i) If A(t) < A'(t), t >_ O, then N _<st-~N t. (ii) If A-I(A'(t)) - t is increasing in t >_ 0, then N _<st-~ N'. (iii) If 2(t) _< 2'(t), t >_ 0, then N _<st-WN'.
3.3. Types of replacement policies Age replacement policy is perhaps the simplest type of preventive maintenance policy where the basic renewal replacement policy is modified. Here items are replaced at failure (unplanned replacements) or replaced when the age of a component reaches some fixed value T (planned replacements). Denote by 0 = T0 A < T~ < ... the consecutive failure times under an age replacement policy, with a fixed replacement age T. The failure point process for this age replacement policy is described by
n>_0
Note that N ~,r = {NA'r, t _> 0} is a renewal process for each T. If 2 denotes the failure rate of the interpoint distances in the corresponding renewal process N (remember that N A,T is a modification of a renewal process N) then the failure rate function of the interpoint distances for N A,r is given by
OO
5(t) =
k--O
),
t > o .
Block replacement policy is another type of modification of the renewal replacement policy. Items are replaced at failure (unplanned replacements), and also at fixed times T, 2T,... (planned replacements). Denote by 0 = T~ < T( < --- the consecutive failure times under a block replacement policy, with a fixed replacement block T. The failure point process for this block replacement policy is described by
n>0
In general N B,T = {NtB'r, t _> 0) need not be a renewal process.
489
The processes N A,r and N B,r described above count only the number of unplanned replacements. Sometimes it is of interest to count the total number of replacements (planned and unplanned). We will denote the point processes that do these counts by LA,r and LB,r, respectively. Minimal repair replacement policy is a policy in which an item, upon failure, is only minimally repaired; that is, the item is brought back to a working condition, but it is only as good then as it was just before it failed. Formally, if an item, with life distribution F, fails at some time t then the survival function corresponding to the distribution of the next failure time of the item is given by F(s)/[~(t), s > t (where F - 1 - F denotes the survival function that is associated with F). Denote by 0 = T~4 < T~ < ... the consecutive failure times under a minimal repair replacement policy. The failure point process for this minimal repair replacement policy is described by
n_>0
Obviously N M = {Ny, t > 0} is a nonhomogeneous Poisson process with mean function A = - log ,g'. The intensity function of N ~ is given by
=
d NA(t),
t >_ o .
Apart from the basic age, block and minimal repair replacement policies, there are many other replacement policies that have been discussed in the literature. For example, Sumita and Shanthikumar (1988), Block, Langberg and Savits (1993), and Baxter, Kijima and Tortorella (1996) have introduced and studied several classes of replacement policies. A paper which reviews many variations of the basic replacement policies is the one by Beichelt (1993). For illustrative purposes we will discuss in this section one of the policies described in Beichelt (1993). In this policy the first g - 1 failures are removed by minimal repairs (g is some positive integer). The gth failure is then removed by a replacement. The next g - 1 failures are then again removed by minimal repairs, and the (2g)th one is removed by a replacement, and so on. Denote the point process that counts the failures in this policy by N M(e) = {N~ (e), t _> 0}. Note that N M(1) = N, whereas, roughly speaking, N M(~) = N M. Thus, informally, N M(e) can be thought of as corresponding to a replacement policy that is 'between' the renewal replacement policy and the minimal repair replacement policy.
3.4. Intetfailure comparisons of replacement policies

In this subsection we describe some comparison results among various replacement policies involving the order _<st-~- Recall that this order is stronger than the _<st-~ order and it allows us to compare infinite sequences of interfailure intervals. Most of the results formulated in the literature in this area are concerned with the <st-~ order which allows comparison of the counting processes of failures over
490
M. K(jima, H. Li and M. Shaked
arbitrary time intervals [0, t]. An advantage of having some comparison results in terms of the _<st-o~ order relies on the fact that they enable us in addition to compare (increasing) functionals defined on the whole sequence of interfailure times, for the corresponding replacement policies. Some benefit and cost functions have this structure. For example, suppose that the benefit rate (per unit of time) at time t, that the point process yields, depends only on the amount of time that has passed since the last unplanned replacement. Formally, if we denote by X a generic interfailure interval, then we assume that the total benefit obtained during this interfailure interval is f ~ g(u)du where g is some non-negative benefit rate function. If we denote G(x) = fo g(u)du, x -> O, then the average rate of benefit gained during that interfailure interval is G(X)/X. (Note that the average rate of benefit gained during an interfailure interval is an appropriate criterion to be studied in such situations, because the average rate of benefit over a long time interval is an average of such averages.) It is often reasonable to suppose that g is increasing, since the rate of benefit should increase with the age of the current working item (due, for example, to the experience that is continuously being gained from operating this particular item). But, if g is increasing then G(x)/x is also increasing in x > 0. Thus, if the process N is associated with the interfailure intervals {Xn}, and the process N' is associated with the interfailure intervals {X~}, and if N _<st-~ N', then the process N is 'better' than N' (in the sense of having higher average rates of benefit) because X, _>stX~ implies that G(X,)/X~ _>st G(X'~)/X'~, n _> 1. (Note that in order to obtain G(X,)/Xn _>st G(X~)/X~ we do not really need to assume that g is increasing; rather, it is enough to assume only that G is star-shaped.) We proceed now to a description of some comparison results in the sense of the <st-o~ order. Recall that a life distribution function F is NRU if F(x +y) <_~'(x)F(y) for all x,y _> O. The NWU notion is defined by reversing the inequality. Since both N and N n,r are renewal processes we can use Theorem 3.1 in order to obtain a stochastic comparison between them. We do not give here the details, but they can be found in Shaked and Szekli (1995). THEOREM 3.7. Consider a non-delayed renewal process N and the family of failure point processes for age replacement policies N A,r, T >_ O, associated with it. Then (i) F is NBU if, and only if, N A,r _<st-oe N for all T > 0, (ii) F is NWU if, and only if, N A,r ->st-ooN for all T _> 0. THEOREM 3.8. Consider a non-delayed renewal process N and the family of failure point processes for age replacement policies N n,r, T >_ O, associated with it. Then (i) F is NBU if, and only if, N A,T ~<st-oeNA'mTfor all T > 0 and positive integers m, (ii) F is NWU if, and only if, N A,r _>st-o~N A'mr for all T_> 0 and positive integers m.
491
Loosely speaking, Theorem 3.7 is a special case of Theorem 3.8, since, by letting m --+ ec in Theorem 3.8 one obtains Theorem 3.7. Turning our attention to block replacement policies we have the following result. THEOREM 3.9. Consider a non-delayed renewal process N and the family of failure point processes for block replacement policies N B,r, T _> 0, associated with it. Then (i) F is NBU if, and only if, N B'r _<st-o~N for all T _> 0, (ii) F is NWU if, and only if, N B,r _>st-~ N for all T _> 0. The next result is stronger than Theorem 3.9 in the sense that it applies to general block schedules (see Block, Langberg and Savits (1990a)). Let de = {zn} be a sequence of real numbers satisfying 0 < zl < z2 < ... with l i m , ~ z, = oc. By a block replacement policy with block schedule de we mean a policy in which planned replacements by a new item occur at the preassigned times Zl, z2,.... We denote the point process corresponding to the unplanned replacements in this policy by N B'~. Note that the classical block replacement policy corresponds to the choice z, = nT, n > 1. THEOREM 3.10. Consider a non-delayed renewal process N and the family of failure point processes for block replacement policies N B,~ with block schedules X, associated with it. Then (i) F is NBU if, and only if, N ~,~ _<st-o~N for all block schedules de, (ii) F is NWU if, and only if, N B'~ _>st-ooN for all block schedules d e. The next result describes a stochastic comparison of age and block replacement policies. Recall that a distribution function F is IFR if its cumulative hazard function A = - log F' is convex. THEOREM 3.11. Consider the processes N A,r and N B,r which are associated with the same non-delayed renewal process N, and having the same fixed value T _> 0. If F is IFR (DFR) then
N 8,r _~st-ec N A'r
(NA'T <st-o~ N e'r)
A result that can be deduced from Theorem 3.1 is the following. THEOREM 3.12. Consider the processes N A,r and N A,T' which are associated with the same non-delayed renewal process N, but having different replacement ages T and U. (i) If F is IFR then N A,r _<st-~ N A y whenever T < T I. (ii) If F is DFR then N A,r' _<st-~ N ~'r whenever T < T ~. We now turn our attention to the minimal repair replacement policy N M. We need the following definition. Let F and F ~ be two life distributions. If
492
F - l ( F ' ( t ) ) - t is increasing in t _> 0 (or, equivalently, if A -1 (A'(t)) - t is increasing in t _> 0) then U is said to be smaller than F in the dispersive order, denoted by
U _<dispF. Many characterizations of the dispersive order, and its usefulness, can be found in Shaked (1982), in Shaked and Shanthikumar (1994), and in references therein. From Theorem 3.6 (ii) one can obtain the following result (here we denote by N M,F the point process corresponding to a minimal repair replacement policy with the underlying life distribution F). THEOREM 3.13. Consider the processes N M'F and N ~,F'. If U <_dispF then
N M'F _~st-c~ N M'U.
The next two results describe stochastic comparisons of the minimal repair replacement policy with the basic renewal replacement policy and with the general block replacement policy. THEOREM 3.14. Let N be a non-delayed renewal process and let N M be the point process corresponding to the minimal repair replacement policy which is associated with N. (i) If the underlying life distribution F is NBU then N <-st-ooN M. (ii) If the underlying life distribution F is y w v then N _>st-ooN M. Tr~EOREM 3.15. Consider the processes N B,~ and N M which have the same underlying life distribution F. If F is NBtJ (NWU) then N e,~ <_st-o~N M (NB'~ _>st-~ N M) for all block schedules ~ . We end this subsection by giving some <_st-~ comparisons involving the processes L A,r and LB,r which count the total number of planned and unplanned replacements (see the definition in Subsection 3.3). The first result of this kind, which is given next, can be deduced from Theorem 3.1. THEOREM 3.16. Consider the processes L A,r and L A,r', which are associated with the same non-delayed renewal process N, but having different replacement ages T and U. Then L A,r' _<st-~ L A'r whenever T < T ~. Informally, the next result can be obtained from Theorem 3.16 by letting T~ --~ oc there. THEOREM 3.17. For T > 0, consider the process L A,r which is associated with the non-delayed renewal process N. Then N _<st-~ L A'r. When we want to indicate the underlying life distribution F corresponding to the process L A,T we denote it by L A'F,r. The next result also can be obtained from Theorem 3.1. THEOREM 3.18. For T > 0, consider the two processes L A,F,r and L Ay',r. If F <_st U then L Ay,r >_st-o~LA'F"T.
493
Block, Langberg and Savits (1990a) implicitly proved the following result by a direct construction. THEOREM 3.19. For a block schedule ~e, consider the process LB,~ which is associated with the non-delayed renewal process N. Then N _~st-eo L B'~.
3.5. Thinning comparisons for replacement policies

In this subsection we describe some comparison results in terms of the order _<st-Ju. A comparison of two failure processes corresponding to different replacement policies with this respect to this order has the following interpretation: At any arbitrary (small) time interval the smaller process has stochastically less failures than the larger process; that is, one policy is 'globally' worse than the other. THEOREM 3.20. Consider a non-delayed renewal process N and the failure process of an age policy N A,r associated with it, for a fixed replacement age T. I f F is DFR then
N __~st.~/-N A'T
THEOREM 3.21. Consider a non-delayed renewal process N and the failure processes N B,T and N ~,mT corresponding to two block replacement policies which are associated with it, for a fixed T and a fixed positive integer m. (i) If F is ~FR then N ~,mr >_~t-d N B'r. (ii) If F is DFR then N B,mr _<st-y N ~'rTheorem 3.21 can be strengthened as follows. First we need the following definition. Let f and 5 e' be two block schedules. We say that f ' is a refinement of ~e if ~ C_ 5 el. In the language of Shaked and Shanthikumar (1989) this means that ~ is thinner than ~ , or that ~ is denser than ~e. THEOREM 3.22. Consider a non-delayed renewal process N and the failure processes N B,~ and N ~,~', corresponding to two block replacement policies with block schedules ~ and f , which are associated with it. Suppose that ~ c_ f . (i) If F is ~FR then N B,z~ _>st-y N B'S. (ii) If F is DER then N B'~ _<st-~"N B'y'By taking ~ = (3 in Theorem 3.22 we get the following result. THEOREM 3.23. Consider a non-delayed renewal process N and the failure process N B,S, corresponding to the block replacement policy with block schedule ~1, which is associated with it. (i) If F is IFR then N _>st-X N B'~' for all block schedules ~e~. (ii) If F is DFR then N _<st-~/ N B'~' for all block schedules ~/.
494
Turning our attention to minimal repair replacement policies, the following result can be obtained from Theorem 3.6 (iii). THEOREM 3.24. Consider the processes N M'F and N M'F'. If U <~hr F then
NM'F <_st-W NM'U.
The following policy, which is a generalization of the minimal repair replacement policy and of the block replacement policy with block schedule ~ , is studied in Block, Langberg and Savits (1990a). In this policy, upon a failure, the item is minimally repaired, and at fixed scheduled times, N = {Zl,Z2,...}, planned replacements take place. Denote by N M'F'y{ the point process associated with the unplanned replacements in this policy, where F is the underlying lifetime distribution. Note that between each pair of consecutive scheduled replacement times zn-1 and zn the process increments as a nonhomogeneous Poisson process with mean function A = - log f'. Thus the following result is immediate from Theorem 3.24. THEOREM 3.25. Consider the processes N M'F'~ and N M'F''~. If U -<hr F then N M'F'~ <-st-W N M'F''y for all block schedules ~ . If F and U are lifetime distributions then U _<hrF implies that U <_stF. By assuming only U _<stF (rather than U -<hr F), Block, Langberg and Savits (1990a) obtained the comparisons of Theorems 3.24 and 3.25 but with the weaker order <_st-N (rather than ~st-dV). We will now compare the point process N B,zz' (corresponding to a block replacement policy with block schedule ~ ) with the point process N M,~ (corresponding to a minimal repair replacement policy with planned replacement schedule Y'); below we drop the designation F of the common life distribution which underlies both processes. THEOREM 3.26. Consider the processes N B,S and N M'Lr with the underlying life distribution F. Suppose that Y" C_ Y". (i) If F is IFR then N M, _>st-WN B'~'. (ii) If F is r)FR then N M,~ <_st-w N B'ze'. Another result, which is similar to Theorem 3.26, is stated next. THEOREM 3.27. Consider the processes N m,~' and N M,X with the underlying life distribution F. Suppose that ~ C_ ~ . (i) If F is IFR then N M,~ ->st-WN M'S. (ii) If F is DVR then N M,zz <_st-w N M'~'. As corollaries of Theorems 3.26 and 3.27 we obtain the following results. THEOREM 3.28. Consider the processes N B,z~' and N M with the underlying life distribution F.
495
(i) If F is IFR then N ~ ~ s t - S N B'~' for all block schedules 5('. (ii) If F is I)FR then N M <st-~ N e'~' for all block schedules 5( ~. THEOREM 3.29. Consider the processes N and N M with the underlying life distribution F. (i) If F is IFR then N M _>st-y N. (ii) If F is DFR then N ~ _<st-X N. Most of the results in Theorems 3.26-3.29 have analogs in Block, Langberg and Savits (1990a, 1993) and in Shaked and Shanthikumar (1989). In these analogs the conclusions are stated by means of the order <st-~ (rather than _< t-y), but under the weaker conditions that F is NBU or NWU (rather than that F is IFR
or DFR).
Recall from Subsection 3.3 that, intuitively, the process N M(g) is 'between' N and N M. The next theorem somewhat formalizes this idea. THEOREM 3.30. Fix a positive integer g and consider the processes N , N M(g) and
N M with the underlying life distribution F.
(i) If F is ~FR then N M st_d/- N M(g) ~st.~# N. (ii) If F is DFR then N M _<st-dpN M(z) <_st-x N. We end this subsection by giving some _<st-~ comparisons involving the process LB'T which counts the total number of planned and unplanned replacements (see the definition in Subsection 3.3). THEOREM 3.31. For a block schedule ~ ' , consider the process L B'S which is associated with the non-delayed renewal process N. If the underlying life distribution F is DFR then N _<st-~oL B'zr'. In particular, i f F is DVR then N _<st-~ L B'T for all T > 0. Denote by L M'F'zr the point process which counts planned and unplanned 'replacements in a minimal repair replacement policy with block schedule ~ and underlying life distribution F. F r o m Theorem 3.25 we get at once the next result. THEOREM 3.32. Consider the processes L M,F,~ and L M,F',~. If F ~ ~hr F then L M'F'~ <_st-y L M'F'S for all block schedules ~e.
4. Comparison of repairable systems via point processes

The minimal repair replacement policy described in Subsection 3.3 assumes that, upon failure, the system is minimally repaired. A renewal process (see Subsection 3.2) models a replacement policy in which, upon failure, the item is perfectly repaired. However, the assumption that a repair can only be perfect or minimal may be overly restrictive in practice given the wide variety of maintenance actions.
496
Hence there have been some attempts to model the condition of the system immediately following a repair. Brown and Proschan (1983) introduced the model of imperfect repair in which it is assumed that each repair is either perfect (with probability p) or is minimal (with probability 1 - p). This model includes the renewal process (perfect repair case) and the nonhomogeneous Poisson process (minimal repair case) as special cases, and hence it was an important starting step for the development of more general concepts of repair activity. The model was generalized by Block, Borges and Savits (1985) to allow the probabilityp to be timedependent, and by Shaked and Shanthikumar (1986b) to the multivariate setting. The imperfect repair models are, however, not sufficiently flexible to describe the state of the system immediately after repair, when the repair is intermediate between that of a new system and that obtained by a minimal repair. In reality, the repair activity could also bring the state of the system immediately after the repair to be worse than its state prior to failure (clumsy repair). Hence, some authors have proposed more general models for the condition of the system immediately following the repair. A pioneering work that considers a general repair model and which analyses the stochastic behavior of the corresponding point process was Kijima (1989b). He assumed that V~, the virtual age of the system following the nth repair, is given either by Model I: V~ = V~_I +AnX~ or by Model II: V,,,=An(V~ 1 +X~) , where An is a random variable representing the degree of the nth repair and X, is the nth interfailure time, and V0 =- 0. For a given failure-time distribution F (and survival function F" = 1 - F) of a new system, the interfailure time X~ has the conditional survival function
P(X~ > x]V,, 1 = y ) = PAx) dealF (y(y) x+Y) ,
n>_l,
(4.1)
and the A~ are independently (but not necessarily identically) distributed. It is easily seen that in both models An = 0 means perfect repair whereas A, = 1 means minimal repair. Hence these models include the renewal and the nonhomogeneous Poisson processes as special cases. The models of Kijima (1989b) have been generalized in various ways. See, for example, Kijima and Nakagawa (1991), Baxter, Kijima and Tortorella (1996), Last and Szekli (1998), and references therein. Kijima, Morimura and Suzuki (1988), Stadje and Zuckerman (1991) and Makis and Jardine (1993) considered optimal maintenance strategies, while Dagpunar (1998) developed some computational methodology for a general repair functional. See also Guo and Love (1992) for a statistical issue on this subject. In this section it is intended to outline recent results on the stochastic behavior of the corresponding point processes of the general repair models. Other matters
497
such as maintenance strategies and computational issues, despite their apparent importance in practice, will not be covered here. These problems have not yet been studied much, so they will be challenging areas for future research. The following results can be found in Last and Szekli (1998) in slightly more general forms. In particular, Last and Szekli (1998) allow planned replacements in the above general repair setting, thereby generalizing a model of Block, Langberg and Savits (1993). Let F be a lifetime distribution of a new system and assume that F ( 0 + ) = 0 and that the associated failure rate function 2 exists. Upon failure, the system is repaired and put back into operation in a negligible amount of time. Denote the consecutive failure times by T1, T2, ., where To = 0, and the interfailure times by Xn = T~ - T._~. The nth virtual age is defined by V~ = (1 - Z~)(V~_~ +X~), n >_ 1 , (4.2)
where V0 - 0, as in Model II of Kijima (1989). However, because Z~ can be dependent on the whole history, the model includes many general repair models previously considered in the literature. For example, Kijima's Model I is obtained by assuming Zn =
(1 - A , , ) &
A1X1 +... +An 1X~-1+ X~
(4.3)
where A~ denotes the degree of the nth repair. It should be noted that Z, may take values in [-0c, 1], whence the model also covers the clumsy repair model of Baxter, Kijima and Tortorella (1996). The interfailure time X, is distributed according to the residual life distribution Fv._l given in (4.1). Note that (4.1) and (4.2) and the assumptions on Z determine the stochastic behavior of the point process {T,, n _> 0}. Let us denote by Dn the distribution function of Zn. Recall that Z~ can depend on the whole history. Hence, the general form of D, is given by Dn(z;#)
=P(Zn <_zlT1 = tl,Z1 =Zl,...,Zn l=Z,_l,Tn =t~) ,
(4.4)
where #={(t~,z~),n>_O} is a realization of {(T,,Z~),n >_O}. The f a m i l y = {D,, n _> 0} and the distribution function F of a new system determine entirely the stochastic behavior of the marked point process {(Tn, Z,), n _> 0} via (4.1), (4.2) and (4.4), and in particular that of N - {T,,, n > 0}. In what follows, we write N = N g to represent this dependence. The next two theorems are special cases of results of Last and Szekli (1998). The first theorem compares two repair processes with the same lifetime distribution but with different degrees of repairs, while the second theorem compares two repair processes with the same degree of repairs but with different lifetime distributions. THEOREM 4.1. Let N = N F and N ' = Ng, with an underlying IFR [DFR] life distribution F. Assume that for all realizations #={(t~,zn),n>_O} and #'= {(t',,z'n),n >_0} such that
498
M. Koima, H. Li and M. Shaked
Dn(tl,Zl,...,tn,Zn)
-< tAn(ttl,Zl,...,tn,Zn)
and (&) -<oo (t'n), where {vn,n > O} is a realization of {V~,n _> 0}, we have Dn(z; p) -< D'n(z; ,'), Then N -<st-~ N' (N' -<st-ooN). In Kijima's Model II, if we take An = 1 - Z n and assume that the Ans are independent, then the assumption An _<stA'n implies (4.5), and hence for an WR F it holds that N _<st-ooN'. On the other hand, in his Model I, if An -<st A'n then we 1 can assume, without loss of generality, for all n, that x'n -< xn, v~ ' _> vn and an < a n for the corresponding realizations. Thus, from (4.3) we have
/
n >_ 1 .
(4.5)
1 - an
1 - an
--
Zn
tAn 1/Xn -[- 1
tAn-1/Xn -]- 1
! 1
and this implies that (4.5) holds again. Hence, Theorem 4.1 is a generalization of Kijima's results (1989) to the case where the Ans are dependent on the history. THEOREM 4.2. Let N = N F and N' = Ng, and assume that G ~hr F. (i) Suppose that either F or G or both are IFR, and that for all / I oo I* = {(tn,Z,),n >_ 0} and i t ' = {(tn,zn),n >_ 0} such that ~n=l I[t,,oo)(') _<e have Dn(z;p)_<Dn(z;p'), n_> 1 .
Then N _<st-e N'. (ii) Suppose that either F or G or both are DFR, and that for all # = {(tn,zn)} and 1 / /~'= {(t,,z~)} such that (tn) _<oo (t'n) we have Dn(z; #) >_ Dn(z; #'), Then N _<st-oN'. In Theorem 4.2 the condition means that the degree of repair at the nth failure becomes smaller (larger) if the history contains more failures in the sense of the _<~ (_<oo) ordering. The assumption becomes natural if deterioration (due to more failures) results in less (more) effective repair activity. n> 1 .
5. A s t o c h a s t i c p r o c e s s a p p r o a c h to f a i l u r e m o d e l s in r a n d o m e n v i r o n m e n t s
5.1. Modeling hazard fimctions as stochastic processes Every deterministic Borel-measurable function 2 : R+--+ N+ is a hazard rate function of some random lifetime T if, and only if, J t 2(s)ds = oo for each t _> 0. The survival function F of T then is given by
499
F(t) = e - x(~)d~,
_0 t>
Similarly, every deterministic Borel-measurable function A : N+ --+ E+ is a (cumulative) hazard function of some random lifetime T if, and only if, A is increasing on E+ and lim~,~o~(A(s) - A ( t ) ) = ec for all t _> 0. The survival function P of T then is given by
if'Q) = e -A(t), t > 0 .
The relationship between the hazard rate function 2 and the (cumulative) hazard function A is then
A(t) =
/0 '
2(s)ds,
t >_ O .
I f the hazard rate function [or, equivalently, the cumulative hazard function], that is associated with the lifetime T of some item, is not explicitly known before the item starts to function, then it is natural to assume that {2(t),t > 0} is a realization of some (smooth enough) non-negative stochastic process X = {X(t), t 2 0} [or, equivalently, that {A(t), t 2 0} is a realization of some (smooth enough) non-negative increasing stochastic process Y = { Y ( t ) , t >_ 0)]. Arias (1981) has referred to processes such as Y as h a z a r d processes. Of course, in such a model almost every realization x of X should satisfy J ~ x ( s ) d s = ec for all t > 0 [or, equivalently, almost every realization y of Y should satisfy lim~_~o~ (y(s) - y ( t ) ) = oc for all t > 0]. The survival function of T can be represented then as
F'(t) = E[e-J~X(s)ds],
or as F(t) = EIe-r(O 1,
t > O,
t > 0 .
Modeling the hazard rate or the cumulative hazard functions as stochastic processes is a c o m m o n procedure when a device performs its task in a randomly varying environment. Different applications led researchers to different choices of the r a n d o m hazard rate function {X(t), t _> 0} or of the r a n d o m (cumulative) hazard function {Y(t), t _> 0}. A detailed study of the abundance of models that can be found in the literature m a y cover a volume or two; Singpurwalla (1995) provides a good review on the various survival models in random environments, emphasizing on the development of parametric distributions. Here we only describe a few models that we have encountered in our own research. Thus, the selection of the models that are described below is quite arbitrary. The reader should be aware that what we describe is only the tip of an iceberg. Let T be a random lifetime with a cumulative hazard function A. Then
T =st i n f { t > 0 : A(t) > S} ,
where S is an exponential random variable with mean 1. In other words, T is the first-passage time of the deterministic process {A(t), t _ _ > _ 0} to the r a n d o m
500
threshold S. ~inlar and Ozekici (1987) generalized this idea to the case where the deterministic process {A(t), t _> 0} is replaced by a random process Y = {Y(t), t > 0} which depends on a random environmental process V = {V(t),t >_ 0}. They stipulated the relationship between Y and V as a differential equation
d r ( 0 : r(V(t), r(t))dt,
t _> 0 ,
where r is a positive measurable function on g ~+ (here 8 is the state space of the environmental process V). (~inlar and ~)zekici (1987) studied the properties of the first-passage time T:inf{t>0 : Y(t) > S} ,
where S is an exponential random variable with mean 1. Note that in the setting of ~inlar and C)zekici (1987), T can be regarded as a functional of V, of r, and of S; say T = L(V, r, S). (~inlar and Ozekici (1987), and ~inlar, Shaked and Shanthikumar (1989) have identified conditions on V and on V, and on r and on ?, such that
L(V,r,S) _<stL(V,~,S) .
In fact, ~inlar and Ozekici (1987), and ~inlar, Shaked and Shanthikumar (1989), studied the multivariate case in which T is a k-dimensional r a n d o m vector, r : 8 x Ek+ ---* Rk+ is a k-dimensional function, and S is a vector of k independent exponential r a n d o m variables with mean 1 (see Subsection 5.2 for more details on this). Two special cases of above general model deserve more discussion. First, a process called the shot-noise process for the hazard rate was discussed in Lemoine and Wenocur (1986). Suppose that an item operates in a r a n d o m environment consisting of a series of events ("shots") whose effect is to induce stresses of unknown varying magnitudes Xk, k >_ O, on the item. The shots are assumed to occur over time according to a nonhomogeneous Poisson process with a known rate 2(t), t _> 0, and whenever a stress of magnitude Ark is induced at an epoch Zk, then its contribution to X(zk + u) is Xkh(u), where the attenuation function h is positive and is decreasing for u >_ 0, and h(u) = 0 for u < 0. Thus the hazard rate of the item at time t is given by
oo
X(t) = ~-~Xkh(t- zk) ,

k=l
(5.1)
where zk, k _> 0, are the epochs at which the shocks occur with respective magnitudes Xk, k > 0. The process {X(t), t > 0) is called a shot-noise process (see Cox and Isham (1980) for more details on these processes). As it was pointed out in Lemoine and Wenocur (1986) and in Singpurwalla and Youngren (1993), the shot-noise process model is meaningful if the imposed stresses have a residual effect on the hazard rate, such as healing after a heart attack, or cracks due to fatigue which tend to close up after the material has borne a load. Under some
501
conditions on the shot-noise process model, Lemoine and Wenocur (1986) (see also Singpurwalla and Youngren (1993)) obtained the survival function of an item operating in a shot-noise environment in terms of its Laplace transform. Special cases of their result provide us with an alternative motivation for some wellknown distributions (such as the Pareto) as failure models, and also yield some new families of failure distributions. The second process for modeling hazard rate that we describe here is the L6vy process. A L6vy process is a right-continuous increasing process with independent and stationary increments, and it has the following representation, X ( t ) = at +
/0/0
zN(du, dz) ,
where the constant (the drift rate) e _> 0, and N is a Poisson random measure on the space R+ x R+ with mean measure n(du, dz) = duv(dz) (here v(dz) is known as the L6vy measure). That is, for every Borel subset B c_ R+ x R+, we have P ( N ( B ) = j) = exp[-n(B)](n(B))J/j!. The L~vy process is a very general process, including, for example, the compound Poisson process. A L6vy process model is meaningful for an item in a random environment comprised of shocks that occur at random time epochs and which inflict random amounts of damage on the item. Note that here a random damage is reflected by a random amount of increase to the hazard rate of the item, rather than on some physically observable entities (such as the wear of the item) as so modeled in Esary, Marshall and Proschan (1973). Under the L6vy process assumption on the hazard rate, Kebir (1991) obtained the explicit form of the survival function of an item in a L'6vy process environment, and showed that the reliability of an item with a L6vy process for its hazard rate is identical to the reliability of an item with a deterministic hazard rate function 2, where 2(t) = at + [1 - e x p ( - t x ) ] v ( d x ) .
Therefore, Kebir's result provides a computational procedure for the evaluation of the survival function of an item operating in a L6vy process environment. The shot-noise process and the L6vy process seem reasonable for modeling the hazard rate of an item in a randomly varying environment. However, since the hazard rate is a physically unobservable entity, it is difficult to justify these assumptions. One way of overcoming this is to develop failure models based on the consideration of two stochastic processes, one for the item state or wear (which is observable) and the other for the covariates that drive it. The advantage of this strategy is to model directly the stochastic behavior of the covariates believed to influence the (unobservable) hazard rate of an item in a random environment. A general approach for modeling covariates is provided by Markov additive processes [see ~inlar (1972)]. Consider two processes: X = {X(t), t > 0} starting at 0 with state space R+, and Z = { Z ( t ) , t >_ 0} starting at z0 with state space 4. Assume that both processes are right-continuous and have left-hand limits almost
502
M. Kijirna, H. Li and M. Shaked
everywhere, and that X is increasing almost surely. We use X(t) to represent the state of an item at time t, and Z(t) to represent a covariate that influences the item. The process (X, Z) = {(X(t), Z(t)), t >_ 0} is said to be a Markov additive process if (X, Z) is a two-dimensional Markov process on the state space ~+ x 8 such that P(X(s + t) - X(s) E A,Z(s + t) E B l ~ s ) = Qt(Z(s),A,B), where ~4~s is the history of the process up to time s, and Qt(z,A,B) = P(X(t) E A,Z(t) C BIz(0 ) = z). Note that if C contains just one element, the Markov additive process actually becomes a one dimensional process X for the state of the item (see Section 2 for a discussion about the aging properties of processes that describe the state of an item). Among many interesting properties of Markov additive processes, the following two can be easily verified. 1. For any 0_< tl _< t2 _< ..- _< t~, the increments X ( t l ) - X ( O ) ~ X ( t 2 ) X ( h ) , . . . ,X(t,) - X ( t ~ _ l ) are conditionally independent given {Z(t), t _< t~}.
2. The process Z is a Markov process with state space g and transition kernel
Pt(z,B) = Qt(z, N+,B).

The motivation for using the Markov additive process in reliability modeling stems from its versatility for describing the wear of an item in a random environment. We describe two examples below and refer the reader to ~inlar (1977) for more discussion and various other examples. EXAMPLE 5.1. Suppose that g consists of only two states r and w, with Z(t) = w denoting that the item is working at time t, and Z(t) = r denoting that the item is at rest or in repair at time t. Suppose also that when Z(t) = w, shocks occur according to a Poisson process with rate 2 and that each shock inflicts a random amount of damage to the item, and when Z(t) = r, there are no shocks and thus no damage to the item. Assume the damage is cumulative and let X(t) denote the total damage to the item at t. Obviously, {(X(t), Z(t)), t >_ 0} is a Markov additive process. [] EXAMPLE 5.2. Suppose that the environmental process {Z(t), t _> 0} is a Markov chain with a finite state space g. Let X(t) denote the cumulative damage of the item at time t. Suppose that when Z(t) = i, we have that X(t) is a Ltvy process with a drift rate c~(i) and a L4vy measure vi(dx). The damages that increase over the different environmental states are assumed to be linearly additive. Suppose also that every change of the environmental states from i to j is accompanied by a shock which causes an additional amount of damage that is independent of the cumulative damage at the time the shock occurs. It can be shown that {(X(t),Z(t)),t > 0} is a Markov additive process. Note that locally, that is, during any small time interval, X(t) is an increasing Ltvy process with parameters depending on the state of the environmental process Z. Therefore, this model is particularly useful for describing situations involving the occurrence of many shocks in very small time intervals where no individual shock causes any measurable damage to the item, such as fatigue due to vibrations. []
503
For above two shock models and some other Markov additive process models, ~inlar (1972, 1977) obtained the conditional probability distribution of X(t) given Z by its Laplace transform at fixed time t. However, the disadvantage is the absence of a closed form expression for the survival function, and therefore, it is not clear how the environmental process Z (covariates) affects the hazard rate of an item operating in a Markov additive process environment. Finally, some historical comments may be of interest. Gaver (1963), Antelman and Savage (1965), Reynolds and Savage (1971), Ammann (1984), and references therein, modeled the random cumulative hazard function {Y(t),t >_ 0} as a specified function of a non-negative stochastic process with non-negative independent increments. Antelman and Savage (1965) developed some computational procedures for the evaluation of the survival function of the underlying random lifetime T when the random cumulative hazard function is of this form. Reynolds and Savage (1971) considered several subclasses of such processes, and they showed that identical failure-time distributions can arise from each of these subclasses. In the context of Bayesian statistics it is natural to model the hazard rate function as a stochastic process. Harris and Singpurwalla (1968) studied the case in which the hazard rate function has a specific form (for example, when it is constant), but it depends on a random parameter. Of course, as we discussed earlier, more generally, the hazard rate function can be any (smooth enough) stochastic process. Dykstra and Laud (1981) modeled the random hazard rate function {X(t), t _> 0} as a G a m m a process. Padgett and Wei (1981) modeled {X(t), t _> 0} as a stochastic process that has constant positive jumps at the times of occurrences of epochs in some underlying Poisson process. Since the realizations of the G a m m a process and of the constant jump process are almost surely increasing, these models ensure that, conditionally, the lifetime T has the WR property. Ammann (1984) expanded this approach to include more general IFR conditional distributions, as well as DFR and U-shaped conditional distributions. 5.2. Lifetimes of components in random environments Consider the lifelengths 7"1,T2, . . . , Tk of k components of a system that operates in a randomly varying environment. For such a system, the simplifying assumption of components independence has resulted in inadequate assessments of system reliability. In reality, there are a number of situations with some form of dependence among the lifetimes. A usual factor inducing correlation is the random environment that affects all the components of the system. Failure models for system reliability that incorporate component dependencies were proposed as early as in Freund (1961) and in Marshall and Olkin (1967); they introduced a shock model subjected to several types of shocks that destroy a random set of components simultaneously. As it is indicated in Lindley and Singpurwalla (1986) and in ~inlar and Ozekici (1987), component dependencies are due to a lack of knowledge about the effect of the common operating environment on the lifetimes. The objective of this subsection is to discuss such dependencies for systems
504
operating in random environments where presence and intensities of stresses (covariates) change over time. Especially we review some failure models for system reliability where dynamic environments are modeled using stochastic processes. We begin with a general failure model introduced in ~inlar and Ozekici (1987) where the failure rates of the components depend on the whole history of the environmental process via the so-called intrinsic age process of each component. Some related models, introduced in Lef+vre and Milhaud (1990) and in Singpurwalla and Youngren (1993), are also discussed. The emphasis here is on the dependence structure of the failure models, and we refer to a review paper by Singpurwalla (1995) for details on the development of parametric classes for the failure models where the effects from the external environments are modeled using gamma processes and shot-noise processes. Let X = {X(t), t >_ 0} be a stochastic process with state space ~ equipped with an appropriate a-algebra. Here X(t) represents the environmental state at time t. Let A-~ {A(t),t > 0} = {(Al(t),A2(t),...,Ak(t)),t > 0} be an increasing continuous process, depending on X, which takes on values in Rk+. For each i, the ith component Ai(t) describes the intrinsic age of component i at time t, and it plays the role of a random cumulative hazard function (see (5.4) below). Let SI, $ 2 , . . . , Sk be independent of X and of each other and have the standard (that is, mean 1) exponential distribution. Then the lifelength of component i is modeled by
Ti=inf{t>_O:Ai(t)>Si},
i= 1,2,...,k .
(5.2)
That is, component i fails when its intrinsic age runs over its 'intrinsic lifelength'
Si. This failure model was introduced in ~inlar and Ozekici (1987), where they
assumed that the intrinsic age process A is a functional of the environmental process X given by,
dAi(t)=ri(X(t),A(t))dt,
t>0,
i= 1,2,...,k ,
(5.3)
for some positive measurable function ri on g x R~_. It follows immediately from (5.2) and (5.3) that
P(T1 > q, T2 > t2,..., Tk > tklX ) = exp -- ~-~ Ai(ti)

i=l
c (5.4)
Thus, given the whole history of the environmental process X, the lifelengths T/s are independent. Note that from (5.3) and (5.4) it follows that, for i = 1 , 2 , . . . , k, we have
ri(X(t), A(t)) = l i m l p ( t < T/< t + ulT/ > t,X(t), A(t)) .

uS0 u
That is, the intrinsic aging rate ri(X(t),A(t)) is the random hazard rate for component i as a function of the environmental state X(t) and of the intrinsic age vector A(t).
Stochastic processesinreliability
505
Under certain regularity condition on re, we can write the vector of the lifelengths T = (T1, T2,..., Tk) as a functional of the environmental process X, of the vector of the intrinsic aging rate functions r = (rl, r 2 , . . . , r~), and of the vector of the intrinsic lifelengths S = ($1, $ 2 , . . . , Sk), as follows,
T = L(X,r,S)
(~inlar, Shaked and Shanthikumar (1989) studied the dependence of the lifelength functional L on its arguments X and r. Specifically, they examined the effects of replacing r and X by another function ~ and another process )(, and obtained a set of conditions under which the lifelength vector T = L(X, r, S) is stochastically dominated by another lifelength vector T =- L(X, ~, S). One of the consequences of their results is the monotonicity property of the lifelength functional L with respect to the environmental process X. THEOREM 5.3. ((~inlar, Shaked and Shanthikumar (1989)) Suppose that r(x, a) is increasing in a for every x E C, and is increasing (respectively, decreasing) in x for every a E Nk+, and that r is continuous on ~ x ~k+. Then L(w, r, s) is decreasing (respectively, increasing) in w for fixed r and s. Therefore, under the conditions of Theorem 5.3, and if the environmental process X is positively correlated in time, the lifelengths T1, T2,..., Tk should be positively dependent in some sense. Indeed, ~inlar, Shaked and Shanthikumar (1989) established such a result on the dependence structure of (T~, T2, ., Tk). First, recall that the N~-valued random variables Z1, Z2,.. , Zk are said to be associated if
Cov(]C(Zl,Z2,...,Zk),g(Zl,Z2,-..,Zk)) ~ 0
for every pair of increasing functions f , g : ~ k _~ N for which the covariance exists. A stochastic process X = {X(t), t > 0} with state space 8 = R~ is said to be associated in time if for any set {tl,t2,...,tk}, the random variables X(tl), X(t2),..., X(t~) are associated (see Lindqvist (1988) for details). THEOREM 5.4. ((~inlar, Shaked and Shanthikumar (1989)) Under the conditions of Theorem 5.3, if g ---- En and if X is associated in time, then the lifelengths T~, T2,..., Tk are associated. Thus, the positive dependence property of component lifelengths emerges in a random environment where environment states at different time instants are positively dependent in the sense of association. Some sufficient conditions on the process X that imply the time-association property of X involve the stochastic monotonicity property of the transition kernel of X, and can be found in Lindqvist (1988). When the random environmental process X is a Markov process with the state space g = E~, ~inlar, Shaked and Shanthikumar (1989) have also found several sets of conditions on X and on r which imply that the components of the random vector T have some multivariate aging properties. Shaked and Shanthikumar (1989) have obtained some variations of the above results.
506
Lef+vre and Milhaud (1990) examined the dependence structure of a similar failure model in a random environment. They assumed the following. The environmental process X has real-valued cadlag trajectories. Given X, the lifelengths TI , 72,..., Tic are independent and
lim!P(t<Ti<t+ulTi>t,X)=ri(t,X(t))
u;0 u --
i=1
~
2,
""
,k
(5.5)
where each ri is a positive continuous function on R+ x R. Observe that given the external environmental process X, the failure rates depend on X only through the current state X(t), rather than on the whole history of X as described in ~inlar and Ozekici (1987). Lef~vre and Milhaud (1990) showed that if the ri(t,x)'s are increasing (or decreasing) in x and the environmental process X is associated in time, then the lifelengths 7"1,T2,..., Tk are associated. A special case of the model of Lef6vre and Milhaud (1990) was investigated in Singpurwalla and Youngren (1993), aiming to develop parametric multivariate distributions. To fix ideas, consider a two component system which operates in an environment that needs not be the same as the design environment. Suppose that the lifelengths T~ and T2 have specified failure rate functions, 21 and 22, respectively, in an ideal design environment. However, the real operating environment comprises of some stresses (covariates) whose presence and intensities change over time, and the net effect of these stresses from the random environment is to modulate )~i(t) to
ri(t,X(t)) = 2i(t)X(t) ,
(5.6)
where X = {X(t), t _> 0}, referred to by Singpurwalla and Youngren (1993) as the environmental factor function, is a suitable stochastic process. Given X, the random lifetimes T1 and T2 are assumed to be independent. If at any time t, the operating environment is harsher than the design environment, then X(t) > 1; otherwise, X(t) _< 1. For this simplified model (see (5.5)), Singpurwalla and Youngren (1993) obtained explicit probabilistic expressions for the joint survival function. Two processes, the extended gamma process and the shot-noise process, have been considered in Singpurwalla and Youngren (1993) for describing the effect of the random operating environment. Both processes, as it was indicated in Singpurwalla and Youngren (1993), have a natural appeal in reliability modeling and survival analysis. A stochastic process Y = {Y(t), t _> 0} is said to be a gamma process with parameters c~ : ~+ ---, R+ (increasing left-continuous function with e(0) = 0) and 1/b, denoted as Y E (Y(e, 1/b), if 1. Y(0) = 0, and Y has independent increments, and 2. Y(t) - Y(s) has a gamma distribution with shape ~(t) - c~(s) and scale b for any t > s > 0. Now let Y(t) = f~ dY(u) where dY(u) = X ( u ) d u whenever X is the environmental factor function of (5.6). Here Y may be viewed as the cumulative effect of the random environment. Suppose that Y ~(c~, l/b) where c~ is continuously
507
differentiable with c((t) = a(t). Then the cumulative hazard of component i in the operating environment, Ai(t) = Jo 2i(u)dY(u), is known as an extended gamma process with parameters c~ and 2i/b (see ~inlar (1980) and Dykstra and Laud (t981) for details on extended gamma processes). As it is explained in Singpurwalla and Youngren (1993) (also see ~inlar (1980) and Youngren (1988)), the gamma processes are meaningful if the effects of stresses from random environments are in the form of impulses. Singpurwalla and Youngren (1993) showed that under the hypothesis that Y E ~(~, l/b), one has
P(TI > t l , T 2 > t 2 ) = e x p { -
fot'log(l+
a(u)du .
(5.7)
- ft21og(1 + ~)a(u)du}
The special cases of the class of the parametric distributions (5.7) include the bivariate exponential and the bivariate Weibull distributions of Marshall and Olkin (1967). Thus the result from Singpurwalla and Youngren (1993) provides an alternative stochastic-process-based motivation for some of the most widely used multivariate life distributions in reliability theory. Notice that if Y E N(c~, l/b) then Y has independent increments. Thus X is associated in time, and therefore T1 and T2 are associated from the result of Lef~vre and Milhaud (1990). The second process that Singpurwalla and Youngren (1993) examined is the shot-noise process (5.1), so X in (5.6) is given by
OO
x(t) =
Xkh(t-
k~O
Under a set of technical conditions on a shot-noise environmental process X = {X(t), t Z 0}, Singpurwalla and Youngren (1993) obtained the joint survival function of T1 and T2 in terms of Laplace transforms. Some special cases of this shot-noise process model provide some interesting new distributions with exponential marginals. It is interesting to note that since X(t), for each t _> 0, is an increasing function of (Xo,X~,...) and of (t0, t l , . . . ) [see (5.1)], we have that ifX0,X1,.., and t0, tl, .. are mutually independent then X = {X(t), t _> 0} is associated in time. Thus T1 and T2 are associated from the result of Lef6vre and Milhaud (1990).
References
Abdel-Hameed, M. (1984a). Life distribution properties of devices subject to a L6vy wear process. Math. Oper. Res. 9, 606 614. Abdel-Hameed, M. (1984b). Life distribution properties of devices subject to a pure jump damage process. J. Appl. Prob. 21, 816-825. Abdel-Hameed, M. S. and F. Proschan (1973). Nonstationary shock models. Stoch. Proces. Appl. 1, 383-404.
508
Ammann, L. P. (1984). Bayesian nonparametric inference for quantal response data. Ann. Star. 12, 636-645. Antelman, G. and I. R. Savage (1965). Characteristic functions of stochastic integrals and reliability theory. Naval Res. Log. Quar. 12, 199-222. Arjas, E. (1981). The failure and hazard processes in multivariate reliability systems. Math. Oper. Res. 6, 551-562. Assaf, D., M. Shaked and J. G. Shanthikumar (1985). First-passage times with PFr densities, J. Appl. Prob. 22, 185-196. Barlow, R. E. and F. Proschan (1975). Statistical Theory of Reliability and Life Testing: Probability Models, Holt, Rinehart and Winston. Baxter, L. A., M. Kijima and M. Tortorella (1996). A point process model for the reliability of a maintained system subject to general repair. Stoch. Models 12, 37-65. Beichelt, F. (1993). A unifying treatment of replacement policies with minimal repair. Naval Res. Logistics 40, 51-67. Block, H. W., W. S. Borges and T. H. Savits (1985). Age-dependent minimal repair. J. Appl. Prob. 22, 370-385. Block, H. W., N. Langberg and T. H. Savits (1990a). Maintenance comparisons: Block policies. J. Appl. Prob. 27, 649-657. Block, H. W., N. Langberg and T. H. Savits (1990b). Stochastic comparisons of maintenance policies. In Topics in Statistical Dependence (Eds., H. W. Block, A. R. Sampson and T. H. Savits), pp. 5768. I M S Lecture Notes-Monograph Series 16. Block, H. W., N. Langberg and T. H. Savits (1993). Repair replacement policies. J. Appl. Prob. 30, 194~206. Block, H. W. and T. H. Savits (1980). Multivariate increasing failure rate average distributions. Ann. Prob. 8, 793 801. Block, H. W. and T. H. Savits (1981). Multidimensional IFRA processes. Ann. Prob. 9, 162 166. Block, H. W. and T. H. Savits (1994). Comparison maintenance policies. In Stochastic Orders and Their Applications (Eds., M. Shaked and J. G. Shanthikumar), pp. 463-483. Academic Press. Brown, M. (1980). Bounds, inequalities, and monotonicity properties for some specialized renewal processes. Ann. Prob. 8, 227540. Brown, M. and N. R. Chaganty (1983). On the first passage time distribution for a class of Markov chains. Ann. Prob. 11, 1000 1008. Brown, M. and F. Proschan (1983). Imperfect repair. J. Appl. Prob. 20, 851-859. ~inlar, E. (1972). Markov additive processes, II. Z. Wahrsch. Verw. Gebiete, 24, 94~121. ~inlar, E. (1977). Shock and wear models and Markov additive processes. In Theory and Applications of Reliability (Eds., I. N. Shimi and C. P. Tsokos), pp. 193-214. Academic Press, New York. ~inlar, E. (1980). On a generalization of gamma processes. J. Appl. Prob. 17, 467-480. ~inlar, E. and S. Ozekici (1987). Reliability of complex devices in random environments. Prob. Engng. Informat. Sei. 1, 97-115. (~inlar, E., M. Shaked and J. G. Shanthikumar (1989). On lifetimes influenced by a common environment. Stoeh. Process. Appl. 33, 347-359. Cox, D. R. and V. Isham (1980). Point Processes. Chapman and Hall, London. Dagpunar, J. S. (1998). Some properties and computational results for a general repair process. Naval Res. Logistics 45, 391-405. Drosen, J. W. (1986). Pure jump models in reliability theory. Adv. Appl. Prob. 18, 423-440. Durham, S., J. Lynch and W. J. Padgett (1990). TPz-orderings and the IFR property with applications. Prob. Engng. Infor. Sei. 4, 73-88. Dykstra, R. L. and P. Laud (1981). A Bayesian nonparametric approach to reliability. Ann. Stat. 9, 356-367. E1-Neweihi, E., F. Proschan and J. Sethuraman (1978). Multistate coherent systems. 9". Appl. Prob. 15, 675-688. Esary, J. D. and A. W. Marshall (1970). Coherent life functions. S I A M J. Appl. Math. 18, 810 814. Esary, J. D., A. W. Marshall and F. Proschan (1973). Shock models and wear processes. Ann. Prob. 1, 627 649.
509
Esary, J. D., F. Proschan and D. W. Walkup (1967). Association of random variables, with applications. Ann. Math. Statist. 38, 1466-1474. Freund, J. E. (1961). A Bivariate extension of the exponential distribution. J. Amer. Statist. Assoc. 56, 971 977. Gaver, D. P. (1963). Random hazard in reliability problems. Technometrics 5, 211-226. Gaver, D. P. and R. G. Miller (1962). Limiting distributions for some storage problems. In Studies in Applied Probability and Management Science (Eds., K. J. Arrow, S. Karlin and H. Scarf), pp. 110126. Stanford University Press. Guo, R. and C. E. Love (1992). Statistical analysis of an age model for imperfectly repaired systems. Qual. Relia. Engng. Int. 8, 133-146. Harris, C. M. and N. D. Singpurwalla (1968). Life distributions derived from stochastic hazard functions. IEEE Transact. Reliab. R-17, 70-79. Karasu, I. and S. Ozekici (1989). NBUE and NWUE properties of increasing Markov processes. J. Appl. Prob. 26, 827 834. Kebir, Y. (199I). On hazard rate processes. Naval Res. Logist. 38, 865-876. Keilson, J. (1979). Markov Chains Models - Rarity and Exponentiality. Springer-Verlag, New York. Kijima, M. (t989a). Uniform monotonicity of Markov processes and its related properties. J. Oper. Res. Soc. Japan 32, 475~490. Kijima, M. (1989b). Some results for repairable systems with general repair. J. Appl. Prob. 26, 89-102. Kijima, M., H. Morimura and Y. Suzuki (1988). Periodical replacement problem without assuming minimal repair. Europ. J. Operat. Res. 37, 194-203. Kijima, M. and T. Nakagawa (1991). A cumulative damage shock model with imperfect preventive maintenance. Naval Res. Logist. 38, 145-156. Klefsj6, B. (1981). HNBUE survival under some shock models. Scandinav. J. Stat. 8, 39~47. Lain, C. Y. T. (1992). New better than used in expectation processes. J. Appl. Prob. 29, 116-128. Langberg, N. (1988). Comparison of replacement policies. J. Appl. Prob. 25, 780-788. Last, G. and R. Szekli (1998). Stochastic comparison of repairable systems. J. Appl. Prob. 35, 348-370. Lee, S. and J. Lynch (1997). Total positivity of Markov chains and the failure rate character of some first passage times. Adv. Appl. Prob. 29, 713-732. Lef~vre, C. and X. Milhaud (1990). On the association of the lifelengths of components subjected to a stochastic environment. Adv. Appl. Prob. 22, 961-964. Lemoine, A. J. and M. L. Wenocur (1986). A note on shot-noise and reliability modeling. Operat. Res. 34, 320 323. Li, H. and M. Shaked (1995). On the first passage times for Markov processes with monotone convex transition kernels. Stoch. Process. Appl. 58, 205-216. Li, H. and M. Shaked (1997). Aging first-passage times. In Encyclopedia of Statistical Sciences, Update Vol. 1 (Eds., S. Kotz, C. B. Read and D. L. Banks), pp. 11 20. John Wiley & Sons. Lindley, D. and N. D. Singpurwalla (1986). Multivariate distributions for the life lengths of components of a system sharing a common environment. J. Appl. Prob. 23, 418~431. Lindqvist, B. H. (1988). Association of probability measures on partially ordered sets. J. Multivar. Anal. 26, 111-132. Makis, V. and A. K. S. Jardine (1993). A note on optimal replacement policy under general repair. Europ. J. Operat. Res. 69, 75 82. Marshall, A. W. (1994). A system model for reliability studies. Statistica Sinica 4, 549-565. Marshall, M. and I. Olkin (1967). A multivariate exponential distribution. J. Amer. Star. Assoc. 62, 30 44. Marshall, A. W. and M. Shaked (1979). Multivariate shock models for distributions with increasing hazard rate average. Ann. Prob. 7, 343 358. Marshall, A. W. and M. Shaked (1982). A class of multivariate new better than used distributions. Ann. Prob. 10, 259-264. Marshall, A. W. and M. Shaked (1983). New better than used processes. Adv. Appl. Prob. 15, 601-615. Marshall, A. W. and M. Shaked (1986). NBU processes with general state space. Math. Operat. Res. 11, 95-109.
510
0zekici, S. and N. O. Gfinltik (1992). Maintenance of a device with age-dependent exponential failures. Naval Res. Logist. 39, 699-714. Padgett, W. J. and L. J. Wei (1981). A Bayesian nonparametric estimator of survival probability assuming increasing failure rate. Communications in Statistics Theory and Methods A10, 49 63. P6rez-Oc6n, R. P. and M. L. G. Gfimiz-P6rez (1995). On the HNBUE property of correlated cumulative shock models. Adv. Appl. Prob. 27, 1186-1188. P6rez-Oc6n, R. P. and M. L. G. G~miz-P6rez (1996). On first-passage times in increasing Markov processes. Stat. Prob. Lett. 26, 199 203. Reynolds, D. S. and I. R. Savage (1971). Random wear models in reliability theory. Adv. Appl. Prob. 3, 229348. Ross, S. M. (1979). Multivalued state component systems. Ann. Prob. 7, 379-383. Ross, S. M. (1981). Generalized Poisson shock models. Ann. Prob. 9, 896 898. Savits, T. H. and M. Shaked (1981). Shock models and the MIFRA property. Stochas. Proc. Appl. 11, 273-283. Shaked, M. (1982). Dispersive ordering of distributions. J. Appl. Prob. 19, 310-320. Shaked, M. (1984). Wear and damage processes from shock models in reliability theory. In Reliability Theory and Models: Stochastic Failure Models, Optimal Maintenance Policies, Life Testing, and Structures (Eds., M. S. Abdel-Hameed, E. ~inlar and J. Queen), pp. 43-64. Academic Press. Shaked, M. and J. G. Shanthiknmar (1986a). IFRA processes. Reliability and Quality Control (Ed., A. P. Basu), pp. 345-352. Elsevier Science Publishers. Shaked, M. and J. G. Shanthiknmar (1986b). Multivariate imperfect repair. Oper. Res. 34, 437-448. Shaked, M. and J. G. Shanthikumar (1987). IFRA properties of some Markov jump processes with general state space. Math. Oper. Res. 12, 56~568. Shaked, M. and J. G. Shanthikumar (1988). On the first-passage times of pure jump processes. J. Appl. Prob. 25, 501-509. Shaked, M. and J. G. Shanthikumar (1989). Some replacement policies in a random environment. Prob. Engng. Infor. Sci. 3, 117 134. Shaked, M. and J. G. Shanthikumar (1991). Shock models with MIFRA time to failure distributions. J. Stat. Plann. Infer. 29, 157 169. Shaked, M. and J. G. Shanthikumar (1994). Stochastic Orders and Their Applications, Academic Press, Boston. Shaked, M. and R. Szekli (1995). Comparison of replacement policies via point processes. Adv. Appl. Prob. 27, 1079-1103. Shaked, M. and H. Zhu (1992). Some results on block replacement policies and renewal theory. J. Appl. Prob. 29, 93~946. Shanthikumar, J. G. (1984). Processes with new better than used first passage times. Adv. Appl. Prob. 16, 667-686. Shanthikumar, J. G. (1988). DFR property of first-passage times and its preservation under geometric compounding. Ann. Prob. 16, 397-406. Shanthikumar, J. G. and U. Sumita (1984). Distribution properties of the system failure time in a general shock model. Adv. Appl. Prob. 16, 363 377. Singpurwalla, N. (1995). Survival in dynamic environments. Stat. Sci. 10, 86-103. Singpurwalla, N. and M. Youngren (1993). Multivariate distributions induced by dynamic environments. Scand. J. Stat. 20, 251 261. Stadje, W. and D. Zuckerman (1991). Optimal maintenance strategies for repairable systems with general degree of repair. J. Appl. Prob. 28, 384-396. Sumita, U. and J. G. Shanthikumar (1985). A class of correlated cumulative shock models. Adv. Appl. Prob. 17, 347 366. Sumita, U, and J. G. Shanthikumar (1988). An age-dependent counting process generated from a renewal process. Adv. Appl. Prob. 20, 739-755. Youngren, M. (1988). Dependent lifelengths induced by dynamic environments. Ph.D. dissertation, George Washington University, Washington, DC.
l I~ 1 k)
On the Supports of Stochastic Processes of Multiplicity One
A. Ktopotowski and M. G. Nadkarni
O. Introduction
0.1. Consider the transformation T : Z ----, Z, T(k) = k + 1, k E Z. If we define fo(k) = 1 for k = 0 and f0(k) = 0 otherwise, then the functions f0 o T ~, n E 77, are mutually orthogonal and span the Hilbert space L2(Z, 2 z, 20) (by span we mean closed linear span), where the measure 20 assigns unit mass to each integer. 0.2. A question due to Stefan Banach asks if the analogous property holds for the real line. Namely, if there exist a measurable one-to-one and onto transformation T : ~---+ R for which there is a function f0 C L 2 ( ~ , ~ , 2 ) such that f , = f 0 o T ~, n E Z, are mutually orthogonal and span L 2 ( ~ , ~ , ) ~ ) , where 2 is the Lebesgue measure on o--algebra N~ of Borel subsets on real line. It is clear that if such a T exists, then T preserves 2 and the linear operator Ur : f ---+ f o T, f C L2([~, ~ R , 2), is unitary. 0.3. Banach's question is still unresolved and together with its probabilistic version it is one of the oldest open problems in the ergodic theory. The probabilistic version of Banach's problem asks if there exists a measurable one-to-one and onto transformation T of some probability space (f2, ~ , P) for which there exists a square integrable random variable X with mean zero (i.e. centered) such that {1} U{X o T~:n E Z} is an orthonormal base of the Hilbert space L2(Y2, ~ , P ) (such a T necessarily preserves P). We shall refer to this question as "the problem of the existence of a dynamical system with the simple Lebesgue spectrum" or simply as "the problem of simple Lebesgue spectrum". 0.4. More generally consider a two sided sequence of square integrable centered random variables Xn, n E Z, defined on the probability space (g2, ~ , P), where is a the smallest o--algebra with respect to which Xn, n ~ Y, are measurable. Assume that the stochastic process Xn, n ~ Z, has multiplicity one in the sense that the closed linear span ofXn, n c Z, is L2(p, ~ ' , P ) , the centered square integrable functions on (~2, S , P ) . This is the case when we take Xn = U~Xo, n C Z, where U
511
512
is a unitary operator of multiplicity one arising from a measure preserving automorphism on (~2, J~,P) and X0 is a cyclic vector for U restricted to L~(F2, J~,P). Other examples are given by complete orthonormal systems in L02(~, ~ , P ) such as the system of Walsh functions (see Section 7). Let us consider a doubly infinite product ~'20= C z equipped with product a-algebra ~0 = ~(CZ), the smallest a-algebra with respect to which the coordinate functions ~n, n E 7/, are measurable. The measurable m a p ~: (a,~) - - ~ (a0,N0), 4(0,) = (X~(~o))o% oo, co ~ ~ ,
induces a measure m on (f20, No) given by the formula
m(A) = P ~
I(A),
A CN0
The fact that stochastic process Xn, n E Z, is centered and has multiplicity one implies that the stochastic process of coordinates ~ , n E 77, is centered and of multiplicity one. It is natural to ask if we can obtain some information about the measure m (or its support) in this case, which in turn will have some implications for the problem of simple Lebesgue spectrum discussed above. 0.5. This paper is motivated by such questions. Our analysis will yield some information on the support of the measure m, which in turn allow us to exhibit some new properties of T with simple Lebesgue spectrum, in case such a T exists. The treatment is mostly at the level of sets and functions and gives some descriptive informations about m. They are followed by some open problems related to stochastic processes. In Sections 1-4 we discuss various conditions, necessary and sufficient as well as sufficient, set theoretic as well as measure theoretic, on a subset S of a product space X x Y so that every complex valued function f on S can be expressed in the form
f ( x , y ) = u(x) + v~v),
(x,y) E S ,
where u and v are functions on X and Y respectively. In Sections 5 and 6 we apply these results to dynamics and to the problem of simple Lebesgue spectrum. In Section 7 we discuss a special case of this problem for binary pairwise independent random variables. We give here an interesting property of the support of the measures associated with binary pairwise independent stochastic processes of multiplicity one.
1. Additive decompositions, good sets

1.1. DEFINITION. Let X, Y be arbitrary non-empty sets and fix a subset q) ~ S c_ X x Y. A complex valued function f on S is said to be decomposable if there exist complex valued functions u on X and v on g such that
f ( x , y ) = u(x) + v(y);
(x,y) E S .
(1)
In this case u and v are said to be a decomposition of f (with respect to S).
On the supports o f stochastic processes o f multiplicity one
513
1.2. DEFINITION. We say that a subset ~0~ S _c X x Y is good if every complex valued function on S is decomposable. It is obvious that any n o n - e m p t y subset o f a g o o d set has also this property, but there exist sets which are not g o o d and such that all proper subsets o f it are good. The purpose o f this section is describe g o o d subsets o f X x Y, when X and Y are finite. N o measurability structure is assumed. 1.3. Let / / 1 : X X Y > X and 172 : X Y ~ Y be the projections on X and Y respectively. I f S is good, then any function f : S ----+ C, f = u + v, is completely determined by the values o f u on II1S and v o n / 7 2 S . Therefore it is not a severe restriction on a g o o d set S, if we assume in addition that//15, =- X a n d / 7 2 S = Y. This assumption will be made whenever necessary. 1.4. Assume that X and Y arefinite with m and n elements respectively. We begin with the observation that a g o o d set must be " t h i n " in the sense that it can have at m o s t m + n - 1 points. Let X = { x l , x 2 , . . . ,xm}, Y = {ya,y2,... ,yn} and S = { s l , s 2 , . . . ,sk}, where
S1 ~- ( X i i , Y j l ) , $2 = ( X i 2 , y j 2 ) , . . . , Sk = (Xik,Yjg)
We consider the k (m + n)-matrix M (called the matrix of S) with rows
Mp, 1 <_p < k, given by

Mp=(O,...,O,l,O,...,O,l,O,...,O) ,
where 1 occurs at the places ip and m + jp, corresponding to the subscripts in the pair sp = (xip,y+). Since S is good,
Y
Y4
~3
Y2
Yl
Xl
x2
xa
Fig. l. G o o d set.
514
A. Ktopotowski and M. G. Nadkarni f(sp) = f(xip,Yjp) = u(xip) + v(yjp), 1 <_p <__k .
We put
U(Xil ) = ~il , " " " ' Ig(Xim ) = ~im ' 1)Qyj, ) = ~ jl ' " " " ' I)(YJ n) = ~jn '
The relation (1) gives us k equalities
ip +tlSe = f ( S p ) ,
1 <_p<k
In other words, the column vector ( { 1 , . . . , ~m, q l , . . . , t/,,) t E C m+" is a solution o f the matrix equation 1rig = g , (2)
where g = ( f ( s l ) , f ( s 2 ) , . . . , f ( & ) ) t E C ~. Since S is good, we k n o w that (2) has solution for every g. Since M has m + n columns and since the vector Q , 1 , . . . , 1 , ~ - I , . . . , - 1 ) t is a solution of the h o m o g e n e o u s equation lVff = 0, we see that its rank is at most m + n - 1. Clearly k can not exceed the rank o f M. On the other h a n d the set S = {{xl} x Y} tO {X x {yl}}, the union o f two "axes", is a g o o d subset of X x Y o f cardinality m + n - 1. We have proved: 1.5. PROPOSITION. Let X, Y, S C X x Y be finite sets with m, n and k elements respectively; H I S = X , FI2S = Y. Then S is g o o d if and only if k _< m + n - 1 and the matrix M o f S defined above has rank k. There always exist a g o o d set o f cardinality k _< m + n 1. 1.6. DEFINITION. Let S C X x Y. (We do not assume t h a t X and Y are finite and S is n o t assumed to be good.) We say that a point s = (xo,Yo) E S is isolated in the vertical direction (resp. isolated in the horizontal direction) if ({x0} x Y) N S (resp. (X x {Y0}) n S) is a singleton. 1.7. Let S c_ X x Y be an arbitrary subset o f the cardinality < m + n - 1 with lI1S = X and II2S = Y, IX[ = m, IY[ = n, where IAI denotes the cardinality o f the set A. Every column of the matrix M associated to S is a non-zero vector. Each row has exactly two ones in it. Since the n u m b e r o f columns is m + n, there are at least two columns with exactly one 1 in each o f them. Suppose that the j t h c o l u m n has exactly one 1 which occurs in the ith row. This means that si is isolated in the vertical direction if 1 < j < m, and in the horizontal direction if m + 1 < j < m + n. We cancel f r o m M the ith row and the j t h column to obtain a matrix M and we drop f r o m S the element si to obtain a set S1. We cancel f r o m ~1 all columns which consist only o f z e r o ' s and write M1 for the matrix thus obtained. It is easy to see that M1 is the matrix associated to $1. If the n u m b e r of rows in MI is greater or equal to the n u m b e r o f columns in M I , then $1 is not a g o o d set and afortiori S is not a g o o d set. Otherwise, the n u m b e r o f rows in M1 is smaller than the n u m b e r o f columns in MI and we can apply the above procedure to M1. We obtain a reduced matrix Me and
m t?mes ~ t;mes
On the supports of stochasticprocesses of multiplicity one
515
the smaller set $2, of which M2 is the associated matrix. I f this process of reduction stops at a stage l < k, in the sense that the number of rows in MI is greater or equal to the number of columns in Mz, then the set S is not good. I f the process continues up to stage k (equal to the number of points in S), then S is good and the rank of the matrix M is equal to k. Thus we have obtained. 1.8. PROPOSITION. A subset S c X x Y, where X, Y are two finite sets of cardinality m and n respectively, is good if and only if the process of reduction of the matrix M continues up to k steps, where k is the number of elements in S. Equivalently, if and only if the number of rows in Mi is smaller than the number of columns in Mi for each 1 < i < k.
2. Graphs, couples and their unions

The sets X and Y are no more assumed to be finite in what follows, except when this assumption is explicitly stated. 2.1. PROPOSITION. I f S is the graph of a function g : E ---+ Y, where E c_ X, then S is good. Similarly, i f S is the graph of a function h : F ---+ X, where F c y, then S is good. We have a more general result: 2.2. PROPOSITION. I f S = G U l l , where G is the graph of a function g :E ~ Y, E _CX, H is the graph of a function h : F ---+ X \ E , F c Y\g(E), then S is good. PROOF. For any complex valued function f on S, we define if x E E, if x E X \ E ,
v(y) = { fo(h(y)'Y)
so that (1) is satisfied. This suggests the following:
ifyEF, i f y C Y\F,
2.3. DEFINITION. Let g be a function defined on a subset E C X into Y and h be a function defined on a subset F ___ Y into X. Let G and H be the graphs of g and h respectively. I f
g(E) N F = O , h(F) N E = O ,
then the set S = G U H is called a couple.
516
A. Ktopotowski and M. G, Nadkarni
Each couple is a good set. Let us observe also that not every union of two graphs is a couple; for example, if g and h are onto, then G U H is not a couple. Moreover, a good set need not be a couple, for example the triplet {(0,0), (0, 1), (1,0)} is a good subset of {0, 1} x {0, 1}, which is not a couple. 2.4. We define G = {(x,y) E S : (x,y) is isolated in the vertical direction}, H = {(x,y) E S: (x,y) is isolated in the horizontal direction} . Note that G U H = (G\(G N H)) U H and the latter can be seen as a couple, since 171 (G\(G N H)) N IIIH = (~ and FI2(G\(G N H)) N 172H = ~. Define
s~ = s \ ( c u If) .
Let G1, H1 be obtained from $1 in the same manner as G and H are obtained from S. Proceeding thus we get
s2 = s1\(~1 u , q ~ ) , . . . ,s,+~ =
s,\(G,,
uH,),...
We note that Sn+~ c_ Sn for all n c N. It is easy to see that each Gi U Hi is a couple, being equal to ( Gi\ ( Gi n H,.) ) U Hi. A natural generalisation of Proposition 1.8 is the following: 2.5. PROPOSITION. If nn~_l Sn = ~), then S is good. I f S is good and finite, then
g(E;
/
f
I E I ] h(F) ~ .~t
Fig. 2. Couple.
On the supports of stochastic processes of multiplicity one
517
2.6. DEFINITION. Two couples $1 = G1 U H1, $2 = G2 U//2 are said to be separated, if the sets 111G1, 171H1, 171G2, 17jH2, are mutually disjoint and the same is true for the sets H2G1, 172111, 172G2, 172H2. In this case S = (G1 LJH1) U (G2 L2H2) is a couple too. More generally, it is clear that: 2.7. PROPOSITION. An arbitrary union of pairwise separated couples is a couple, hence a good set.
3. Links, linked and uniquely linked sets, loops 3.1. I f S is finite and good, then at least one of the sets G or H defined in 2.4 is non-empty. The following example shows that this need not be true, if S is infinite. Let X = Y = Z and S = { ( n , n - 1 ) : n C Y _ } U { ( n , n ) : n c E } C _ Y _ Z . No point of S is isolated in either direction. However, S is good. For, let f is any complex valued function on S. We define u ( 0 ) = c, where c is an arbitrary constant. This forces v ( 0 ) = f ( 0 , 0 ) c. Having defined v(0), we see that u(1) = f ( 1 , 0 ) - v(0), v(1) = f ( 1 , 1) - u ( 1 ) . Proceeding thus we see that u and v are uniquely determined as soon as we fix the value of u(0). This example suggests a method of describing good subsets o f X x Y, which is valid also when X or Y or both are infinite. 3.2. DEFINITION. Consider two arbitrary points (x,y), (z, w) C S c_ X x Y (S not necessarily good or finite). We say that (x,y), (z, w) are linked, if there exists a sequence {(xl,yl), ( x 2 , y ; ) , . . . , (xn,yn)} of points in S such that:
.........
............ E 2
1
i ........4
.............
-1
O..._t.
Fig. 3. Zig-zag set.
518
(i) (xl,Yl) = ( x , y ) , (Xn,Yn) = (z,w);

(ii) for any 1 < i < n - 1 exactly one of the following equalities holds:
Xi = Xi+l ~ Yi = Yi+l ;
(iii) ifxi = Xi+l, then Yi+l -- Yi+2, 1 < i < n - 2, and ifyi = Yi+l, then xi+l = xi+2; equivalently, it is n o t possible to have xi = X~+l = x~+2 or yi = y~+a = y~+2 for some 1 < i < n - 2. The sequence {(xl,yl),(xz,y2),...,(x~,y~)} is then called a link (of length n) joining (x,y) to (z,w) and we write (x,y)L(z, w). 3.3. It is easy to see that the relation L is reflexive and symmetric. It is also transitive. F o r let
(x,y)L(z, w)
and
(z, w)L(e,f) .
We show that (x,y)L(e,f). Let {(xl,yl), (x2,Y2),..., (xn,y,)} be a link between (x,y) to (z,w) and let {(z1,w1) , (z2,w2),... , (Zm, Wm)} be a link between (z,w) to (e,f). (Note that (x~,yn) = (Zl, w l ) . ) L e t 1 < k < m be the largest integer such that (zk, wk) is one o f the elements o f the link joining (x,y) to (z, w). I f (zk, Wk) = (x,y), then clearly {(zk,wk),..., (Zm,Wm)} is the link joining (x,y) to (e,f). Assume n o w that (zk, wk) = (xi,yi), i > 1. If Xi-1 = xi zk+l, then Yi = wk = wk+l; if yi_l=yiCWk+l, then xi=zk=zk+l; in b o t h cases { ( x l , y l ) , . . . , (Xi,Yi),(Zk+l,Wk+l),...,(Zm,Wm)} is the link joining (x,y) to (e,f). If xi-1 = xi = Zk+l or Yi-t = y i = wk+l, then we simply drop the point (x~,yi) and { ( x I , Y l ) , . ' - , (Xi 1,Yi-1), ( Z k + l , W k + l ) , . . . , (Zm, Wm)} is the link joining (x,y) to (e,f). (Note that (xi-l,yi-1) (Zk+l,Wk+l) by our choice o f k . ) Thus (x,y)L(e,f), and L is transitive.
Y4 Y3 Y2
[
Yl
Xl
x2
x3
x4
,~
Fig. 4. Linked components.
519
3.4. DEFINITION. An equivalence class under the relation L is called a linked component of S. If (x,y) c S, then the equivalence class to which (x,y) belongs is called the linked component of (x, y). 3.5. Let (xo,yo) E S c_ X x Y. The linked component of (xo,yo) is obtained as a union Un~l Qn, where QI = ( x {y0}) o s, P1 = II1Q1,
Q2 = (Hllp1) NS, P2 = H2Q2,

Q3 : (H21p2) A S , P3 : -]71Q3 ,
and so on. If n is odd, we have
P. = HIQ~, Qn+I = (H[1P~) NS,

Pn+l = H2Qn+I, Qn+2 = (H21pn+I) A S , . . .
A similar description is obtained if we start from the sets

QI = ({xo} x B) n S, /51 --- F/2Q1 .
3.6. Suppose that X and Y are standard Borel spaces and that X x Y is furnished with the product Borel structure. If S c X x Y is a Borel set, then each linked component of S is a countable union of analytic sets, hence the equivalence relation L decomposes S into at least analytic sets. We do not know whether the linked components are always Borel, or, if the partition into linked components is countably generated. 3.7. DEFINITION. Two points (x,y), (z, w) E S C_X x Y are said to be uniquely linked, if there is a unique link joining (x,y) to (z, w). 3.8. THEOREM. Let Q be a linked component of S. Then the following properties are equivalent: (i) any two points of Q are uniquely linked; (ii) some two points of Q are uniquely linked; (iii) for some (x,y) E Q the singleton {(x,y)} is the only link joining (x,y) to itself. PROOF. (i) ~ (ii) is obvious. (ii) ~ (iii). Suppose that there exist (x,y),(z,w)~ Q, which are uniquely linked. If (x,y) = (z, w), then {(x,y)} is the only link joining (x,y) to itself and there is nothing to prove. Suppose (x,y) (z, w). In this case {(x,y)} must be the only link joining (x, y) to itself, because if another link F joins (x, y) to itself and if A is the link joining (x,y) to (z, w), then we can make a new link out of F and A joining (x,y) to (z,w) (using the proof of the transitivity of the relation L) and different from A, contrarily to the uniqueness of A.
520
A, Ktopotowski and M. G. Nadkarni
(iii) ~ (i). Assume (iii) holds and let (xo,Yo) E Q be a point in S such that {(x0,y0)} is the only link joining (xo,Yo) to itself. Let (x,y), (z,w) be any two points in Q. If there are two links joining (x,y) to (z, w), then there is a link joining (x,y) to itself, other than the singleton {(x,y)}. Let this link be F1 and let As be the link joining (xo,yo) to (x,y). Then we can generate from F1 and a l a new link joining (xo,yo) to itself. This contradicts (iii). 3.9. DEFINITION. A linked component of S c_ X x Y is said to be uniquely linked if any two points in it are uniquely linked. The zig-zag set of Figure 3 and the three point set in the lower left hand side of Figure 4 are uniquely linked. 3.10. DEFINITION. A non-trivial link joining (x,y) to itself is called a loop; by trivial link joining (x,y) to itself we mean the link consisting of the singleton {(x,y)}. It is clear that a linked component is uniquely linked, if it has no loops. The four point set forming the vertices of a rectangle (see Figure 4) is a loop. 3.11. If S is uniquely linked, and, if a link { ( x l , Y l ) , ( x 2 , Y 2 ) , . . . , (xn,yn)} joins (x,y) E S to (z,w) E S, then the sequence {xl,x2,... ,xn} can have at most two identical terms and the same holds for the sequence {y~,y%... ,y,}. Moreover, these terms must be consecutive. For, if {x~,x2,... ,x~} has three identical terms, then they can not be consecutive and so there exist i < j, j i + 1, such that xi = xj and xi C xp for i < p < j. Hence the sequence {(xi,Yi),(Xi+l,yi+l),..., (xj,yj) } is a loop, if yi = yj. If yi yj, then { (xi,Yi), . . . , (xj,yj), (xj,Yi) } is a loop. In each case we obtain a contradiction, since S is uniquely linked. 3.12. THEOREM. Assume that S _c X x Y is linked. Then S is good if and only if it is uniquely linked. PROOF. Assume that S is uniquely linked and let f be complex valued function on S. Let (xo,yo) c S and define u(xo) = c, where c is a constant. This forces v(yo) = f ( x o , y o ) - u(xo). We will now show that u(x) and v(y) can be defined unambiguously for all ( x , y ) E S , so that (l) holds. Assume that we have defined u(x) and v(y) for all (x,y) E S, which can be joined to (xo,Yo) by a link of length n. Let (z, w) E S which is joined to (xo,yo) by a link of length n + 1 and let {(xl,yl),(x2,y2),...,(xn,yn),(xn+l,yn+l)} be this link. Since (xl,Yl) = (xo;Yo) is joined to (xn,y~) by a link of length n, by the induction hypothesis u(x~) and v(y~) are correctly defined. If x~ =Xn+l, then u(xn+l) is also defined and v ( y ~ + l ) = f ( x ~ + l , y n + l ) - u(x~+l). Since S is uniquely linked, yn+l can not occur in {Yl,y2,...,Y,,}, so that v(y,+l) is well defined. On the other hand, if y~ =Y,+I, then v(y,,+l), hence also u(xn+l) is correctly defined. We set u and v equal to zero on X\171(S) and Y\II2(S) respectively and conclude (1).
521
We note that u and v are uniquely determined o n / / 1 ( S ) and II2(S) up to an additive constant, since the assignment o f value to u(xo) completely determines u and v on these sets. Assume n o w that the set S, which is g o o d and linked, is not uniquely linked. Then there is a loop {(xl,Yl), (x2,y2),..., (xn,yn)}, so that (xl,yl) = (x~,yn) for some n > 1. W i t h o u t loss o f generality we can assume that (x~,yl), (x2,y2),..., (xn- 1, Y,- 1) are all distinct. Every function f on S satisfies (1). Since {(xl,Yl), (x2,y2),..., (Xn,yn)} is a loop, U(Xl) = u(xn), v(y~) = v(yn). Assume that x, 1 = x ~ , so that u(xn-1) = u(xl). Since xn 1 = x~, we must have Y~-I = y~ 2, so that v(yn-l) = v(y,-2). We have
f (xn-l,yn-1) = u(x~ ) + v(y~_2) ,

in other words, f ( x , - i , y , - l ) is completely determined by the values o f u(xl), f(x~,yl), f(x2,y2), ..., f(Xn-2,Yn-2), which is not possible. Similarly we obtain a contradiction, if we assume that y , - I = y,. This proves the theorem. 3.13. COROLLARY. A subset S C_X x Y is g o o d if and only if each linked c o m p o n e n t o f S is uniquely linked, and also if and only if every finite subset o f S is good. PROOF. Since S is g o o d if and only if there is no link in S which is a loop, the corollary follows. 3.14. REMARK. Assume that S is good. Then the decomposition f = u + v is unique up to additive functions o f x and y respectively, which are constant on the linked components. In other words, i f f = u + v = u~ + v j, then u - ul and v - v~ are constant on the linked components. 3.15. THEOREM. If a subset S C X x Y is uniquely linked, then S is o f the form G tAH, where G is the graph o f a function 9 on a subset o f X and H is the graph o f a function h on a subset o f Y. PROOF. Fix (xo,Yo) E S. Assume for simplicity that ({xo} x Y) n S = {(xo,Yo)}. Let G = { ( x , y ) : (x,y) is joined to (xo,Yo) by a link o f even length}, H = {(x,y) : (x,y) is joined to (xo,Yo) by the link o f odd length} . We note that S = G U H and G N H = (3, since S is uniquely linked. We shall show that G is the graph o f a function 9 on I/1G. Let (u, v), (w,z) C G, (u, v) # (w, z). We show that u w. Let {(xl,yl), (x2,y2),..., (xn,yn)} be a link joining (xo,Yo) to (w,z). N o t e that yn-1 must be equal to yn, since the link is o f even length. I f u = w , then {(xl,yl), (x2,y2),..., (Xn,yn)(= (W,Z) = (U,Z)), (U, V)} is a link o f odd length join-
522
A. Ktopotowskiand M. G. Nadkarni
ing (xo,yo) to (u,v), contrary to the assumption that (u,v) E G. Thus G is the graph of a function 9 on/71G defined by 9(x) : y, if (x,y) c G. Similarly H is the graph of a function h o n / 7 2 H defined by h(y) = x, if (x,y) E H. We now remove the assumption that ({x0} x Y) N S = {(x0,y0)}. Let G~ denote all those points (x,y) E S, which can be joined to (xo,yo) by a link {(xl,yl)( = (xo,Yo)), (x2,y2),..., (x~,yn)} of odd length and such that xl = x2; let G2 denote all those points (x,y) E S, which can be joined to (xo,Yo) by a link of even length and such that Yl = Y2. Similarly we define//1 and//2. These four sets are mutually disjoint. If G = G1 U G2 and H : Hx U H2, then S = G U H and as before we can show that G and H are graphs of functions on subsets of X and Y respectively. The theorem is proved. 3.16. COROLLARY. I f S C_X x Y is good, then S is a union of two graphs G and H of functions defined on subsets of X and Y respectively. PROOF. Let S = U~S~ be the partition of S into uniquely linked components. Note that H1S~ N HIS 3 = (~, H2S~ N/~2Sj3 : 0, if c~/?. Since each S~ = G~ U Ha, where G~ is the graph of a function 9~ on/71G~ and Ha is the graph of a function ha on/72H~, we see that S = G U H, G = U~ G~, H = U~H~. Moreover, G and H are graphs of functions on H1G and IIzH respectively. 3.17. PROPOSITION. Let Ci GiUHi. , i ~ I , be an indexed family of couples, where the indexing set I is totally ordered such that for any i c I, C~ r~ (1-111FI1Gj u/7~IH2Hj) = (3 for all j < i. Then U~cr c~ is a good set.
=
PROOF. Assume, in order to arrive at a contradiction, that S Ui~i(Gi u Hi) is not good. Then S admits a loop, say (xl,yl), (x2,y2),..., (xn,yn), which is of shortest possible length. Since Uid Ci = S and since there are only finitely many points in the loop there is an index p such that Gp U Hp contains a point from this loop but no Ci, i < p, contains a point of this loop. Since
=
G n ( n , ' r l , cj u n ln
j) : o
for all j < p, we can replace X and r by X \ Uj<p H, Gj, and Y\ Uj<p H2Hj. Without loss of generality assume that (xl,Yl) E Gp. Since Gp is the graph of a function on a subset of X, each point of it isolated in the vertical direction and so we conclude that x2 x l , yl =Y2, xn_l C x~,yn-1 =Yl. But then (x2,y2), (x3,y3),..., (xn 1,y~-1), (x2,y2) is a loop in S of a smaller length if x~-i ; x2; otherwise (x2,Y2), (x3,y3),..., (x~ 1,yn-1) is a loop of smaller length in S. The result follows.
4. Orthogonal decompositions
4.1. DEFINITION. Let (X, ~ x ) , (Y,-~Y) be standard Borel spaces in the sense that each is isomorphic to the unit interval equipped with its Borel a-algebra.
On the supports of stochasticprocesses of multiplicity one
523
We say that a probability measure m on (X x Y, ~ x -~Y) is good, if for every complex valued measurable function f on X x Y there exist complex valued measurable functions u on X and v on Y such that
f ( x , y ) = u(x) + v(y)
m - a.e.
(3)
The set of measure zero where (3) fails to hold may depend on f . Let
L2(X x Y , m ) = { f C LZ(x Y,m) : E ( f ) = 0} ,

where E ( f ) = f x y f d m , denotes the expected value o f f . We say that m is very good if every function f c L 2 (X x Y, m) can be expressed in the form (3) with u E L2(X, ml) and v C L~(Y, m2) satisfying E(u. ~) -- 0, where ml, m2 denote the projections of m on X and Y respectively (also called marginal
measures).
4.2. In this section we will be concerned with very good measures although good measures seem relevant for study of measure preserving automorphism whose associated unitary operators have multiplicity one. Theorem 4.4 gives a necessary and sufficient condition on m under which it is a very good measure. 4.3. DEFINITION. A measurable subset S _c X x Y is called a measurable couple, if S = G U H, where G is the graph of a measurable function g defined on the measurable subset A c_ X into Y, H is the graph of a measurable function h defined on the measurable subset B C Y into X and
g(A) NB = h(B) NA = (0 .
4.4. THEOREM. A probability measure m on (X Y, Nx ~Y) is very good if and only if it is supported on a measurable couple. PROOF. Assume that m is supported on a measurable couple S = G U H . Let
f E L~(X x Y, ~ x NY, m). Without loss of generality assume that re(H) O.

Then, since m is supported on a measurable couple,
re(H) = mj(X\A) = m2(B) ~ 0 .

Since
E(f) = O, w e
see that
fGf + f H f
= 0.
Write f G f = a. Define
.ix = lJI ':Ill I. ~(x-~5

(0
if x E X \ A ,
if y E B,
if
v(y) = ~ f ( h ( y ) , y ) + m,(~\A)
y Y\B.
524
(In case G = ~, u ( x ) = 0 for all x E X.) We note that (x,y) ~ G tO H and Eu = Ev = E ( u . ~) = O .
(1) holds for all
Assume n o w that m is very good. Let H I , / / 2 denote the projections o f X x Y onto X and Y respectively and let B x , B y denote also the a-algebras / / l l ( B x ) and //~1 (By) respectively. Write EX f = E ( f l B x ) , Er f = E(ftBy )
I f f C L 2 ( X x Y, m) and EX f = 0, then f ( x , y ) = v(y) a.e. To see this note that i f E X ( f ) = 0, then E 0 e) = 0, and since m is very good, we can write f ( x , y ) = u(x) + v(y) m - a.e. with
E(u) = = E(u. f) = 0.
Since E x ( f ) = 0 and u is B x - m e a s u r a b l e E(ft . f ) = E ( a . EX ( f ) ) = 0 = E([ul 2) + E ( u . v) , which implies that u(x) = 0 rnl - almost everywhere and then f ( x , y ) = v(y) m almost everywhere. So f is By-measurable. Similarly, if g E L 2 ( X x Y, m) satisfies EYg = O, then g(x,y) = u(x) m - almost everywhere. I f f , g E L ( X x Y, m) and E x ( f ) = EY(g) = 0, then f . g = 0 m - a.e. To see this, let h = f . g. Since EYg = 0 and f is By-measurable we have E(h) = E ( f . E Y ( g ) ) = 0 . Since h is bounded, it is square integrable. Since m is very g o o d we can write h(x,y) = t(x) + s(y) m - a.e. with E(t) = E(s) = E(t . ~) = O. Again, since E x (f) = 0 and t, g are Bx-measurable, E(t.h) = E(t.g.EXf) = 0 ,
so that E(Itl 2) = 0. Similarly E(Isl 2) = 0 and h(x,y) = 0 m - a.e. We conclude that f vanishes where g does not and that g vanishes where f does not. N o w take ~ E L ~ ( Y , m2) and q / c L ~ ( X , ml), identified with 0 o H 2 and o/11. Letting f = ~ b - E X ( O ) , g = 0 - E Y ( 0 ) , it follows that there exists a measurable set K in X Y such that 4)@) = (EX O)(x) for m - almost all (x,y) ~ K, O(x) = (EYO)(y) for m - almost all (x,y) c X x Y \ K .
525
Since m]K is supported on G = {(x,y) : qS(y) = EX(o(x)} and mixy\ is supported on H = {(x,y) : 0(x) = EY0(y)}, if we choose for qS, 0 one-to-one Borel maps o f Y and X onto [0, 1], then G and H are measurable graphs o f functions defined on subsets o f X and Y respectively. Thus m is supported by a union o f measurable graphs. Moreover, it is supported by a couple. Indeed, fix a one-to-one b o u n d e d function ~ and consider a sequence o f b o u n d e d functions qSn, n c N, which is dense in L2(y, m2). Let K be the intersection o f the sets Kn corresponding to the pair qSn,~,. Each fn = q~n - EX~bn satisfies EXf, = 0 and thus is equal m-a.e, to a function o f y. Hence Kn = { ( x , y ) : f,(y) = 0} (rood m) and K = X B (rood m) with B = f"]~l {Y : fn(Y) = 0}. Since q~,,n E N, are dense in L2(y, m2),
~(y) = EX((o)(x) for m - almost all (x,y) E X B

holds for every q5 E L2(y, m2). Let then q5 be the indicator function o f X x B. We find 1 = EX(o m - a.e. on X x B. Let A = {x C X : (EX~)(x) = 1}. Since EX(c~) = 1 on X x B, we see that the part o f m on A x Y is concentrated on A x B, whence m(A (Y\B)) = 0. Since for (x,y) C (X\A) x B, EX(a(x) 1 = ~b(x,y), we see that m((X\A) x B) = O. Since the restrictions o f m to A x B(C K) and (X\A) (Y\B)(C X Y\K) are supported on graphs, the theorem follows.
5. Connection with dynamics, twisted joinings

5.1. Let T1 be a Borel isomorphism f r o m X onto Y and let T2 be a Borel isomorphism f r o m Y onto X. Define the Borel a u t o m o r p h i s m T o f X x Y by
T(x,y) = (r2y, TlX), (x,y) E X Y .

We assume that T preserves the measure m on ~ x N y and that T 2 is ergodic. It follows that m 2 o T 1 = m l and ms o T2 = m 2 . Further T2 o T1 : X ~ X preserves the measure ml and T1 o T2 : Y ---+ Y preserves the measure m 2 . It is obvious that
r2(x,y)--(r2 o rl(X),r, o r2(y)),
(x,y) c x
r,
and the a u t o m o r p h i s m s T, T~ o T2, T2 o T1 are ergodic on the respective spaces. 5.2. THEOREM. Let (X,-~x) and (Y, .~g) be standard Borel spaces in the sense that each is Borel isomorphic to the unit interval with its Borel a-algebra. Let (f2,.~) = (X Y, J J x ~ g ) and m be a probability measure on N. Let T1 : X , Y, T2 : Y -----+X be Borel a u t o m o r p h i s m s such that T : (x,y) ---+ (T2y, Tlx) is measure preserving and T 2 is ergodic. Assume that m is very good. Then m is supported on the graph o f a one-to-one measurable function on a subset o f X (and hence also on a subset o f Y). The a u t o m o r p h i s m s T 2, 7"i o T2 and 2/2 o T1 are isomorphic.
526
PRoov. Since m is very good, it is supported on a measurable couple, say S = G U H, where G is the graph of a measurable function 9 defined on a measurable subset A C X, and H is the graph of a measurable function h defined on a measurable subset B C Y. Without loss of generality assume that re(G) > 0. Let Px('), x E X, denote the regular conditional probability measure with respect the a-algebra Hi -1 (Bx), also written as Bx, (by abuse of notation). For each x ~ X, Px(') is a probability measure supported on {x} x Y, such that for any A E B x By, P(.) (A) is Bx-measurable and
m(A) = fx Px(A)dml (x)
Recalling the construction of Px('), x c X, it is easy to see that the G1 = {(x,y):
Px{(x,y)}
= 1}
is T 2 invariant. It is also the graph of a measurable function on I/1G1. Clearly, G c G1 and re(G)> 0 by assumption, so, by ergodocity of T 2, re(G1)= 1. Moreover m(TG1) : 1 and TG1 is the graph of a measurable function on a measurable subset Y. Clearly m is supported on G1 n TGI, the graph of a one-toone measurable function on a measurable subset of X (hence also the graph of a measurable function on measurable subset of Y). The projection m a p //1 is measure preserving isomorphism (measure theoretic) of (X Y, B x Bg, m) and (X, Bx,ml). Further T 2 = / / 1 1 o 7"2 o T1 o//1. Similarly, T 2 and T1 o T2 are isomorphic. The theorem is proved.
6. Application to the problem of simple Lebesgue spectrum

6.1. N o w we apply Theorem 5.2 to the problem of Banach (for probability measures) stated in the introduction. Assume that there is a shift invariant probability measure m on Borel subsets of f20 = C Z such that the coordinate functions on f20 are centered, orthogonal and span L20(~20, m). Then every function f C L0 a (O0, m) is an orthogonal sum of functions in L2(X, ml) and L~(Y, m2), where X = C 2z+I , Y = C 2z and ml and m2 are the marginal measures on X and Y respectively. Indeed, i f f c L~(f20, m) then it has an expansion f = ~n~_o~ c,X~, and we can set OO OO u = ~ . . . . c2n+lX2~+l, v =- ~=-o~ e2nX2~and note that f = u + v. So m is very good. By Theorem 5.2 m is supported on a set S in g20 on which the projections maps into X and Y are one-to-one and so T 2, T1 o T2, T2 o T1 are isomorphic. 6.2. F. Parreau has asked if a-algebra generated by Xa, n E Z, is equal to the Borel a-algebra of f20 (modulo m-null sets). This is indeed the case, as Theorem 5.2 has the following generalisation (Theorem 6.3), proved by a similar method. Let (X~,Bi), 1 < i < k be, as before, standard Borel spaces. Let k
~c~0 = H X i , i-1 ,~ = ~TJl @ B 2 @ ' " @ B k .
On the supports o f stochastic processes o f multiplicity one
527
Call a probability measure m on ~ written in the f o r m

f(Xl,X2,... ,Xk)
k-very good, if every f E LZ(X,m) can be
= Ul(/1) -+-"2(X2) +''"-P-Uk(Xk)
with E(ui) = 0 for all 1 X/+I, 1 < i < k, be Borel a u t o m o r p h i s m s , where Xk+l = X1. Define T on X by
T ( X l , X 2 , . . . ,Xk ) : - ( TkXk, r l X l , T2x2, . . . , T k - l X k - l )
6.3. THEOREM. Assume that m is a T-invariant and k-very good probability measure on ~ such that T k is ergodic with respect to m. Then m is supported on a set S c (2o on which the projection m a p s 111,172,... ,17k into X1,X2,... ,Xk respectively are one-to-one. 6.4. REMARK. It is also clear that i f A and B are disjoint subsets of Z, A tAB = 77 and if X = I]icA Ci, Y = I~icB Ci, where Ci = C for all i E 7/, then m, if one such exists, is supported on a measurable couple in X x Y. It seems plausible that such an m is supported on a set in f20 on which projections into the coordinate spaces is one-to-one, in which case the projection of m on any of the coordinate spaces of Q0 can not be discrete, or even admit a discrete component.
7. W a l s h functions
7.1. Let (f2,~-,P) be a n o n - a t o m i c probability space. Does there exist a oneto-one and onto measure preserving t r a n s f o r m a t i o n T : f2 > ~2 and an event A E ~,~, P(A) = 1/2, such that if
X(o~)
]'+1
-1
ifeoEA, if c o C ( 2 \ A ,
then the r a n d o m variables X o T n, n E Z, are pairwise independent and span
7.2. Let us reformulate the above question differently breaking it into two parts. Let ~) = l-[kc~ { - 1, + 1}k, {-- 1, + 1 }k = {-- 1, + 1}, equipped with the usual p r o d u c t t o p o l o g y and the resulting Borel structure ~ ; an element & E ~ is a bilateral sequence {Cok}keZ of +1 and - 1. Does there exist a probability measure # on ~ such that: (i) the coordinates Xk, k E 77, are pairwise independent and they span L 2 (~), ~ , ~)? (ii) moreover, can we choose the measure # to be invariant under the left shift in
g)?
528
7.3. The first question has a positive answer provided by the family of Walsh functions defined below. Expand a real number x c [0, 1] in its binary form x = O.xlx2... x k . . . , which is made unique by insisting that if there are two such expansions, we choose the one with infinitely many ones. Define Rk(x) = 2 x k - - 1 , equivalently Rk(x)= +1 -1 ifxk=l, ifx~=0 . k c Y,
These are called Rademacher functions. They are independent, and since f2 Rk(x)dx = 0, they are also orthogonal, but they do not span L02[0, 11. However, the collection of all distinct finite products Wil,i2,...,ik
=
RilRi2 "" .Ri~;
il < i2 < ... < ik ,
called the Walshfunctions, is mutually orthogonal and span L2[0, 1]. Since they assume only two distinct values, they are also pairwise independent. 7.4. There is another way of viewing Walsh functions. Consider ~) as a compact group with coordinatewise multiplication and let h denote the normalised Haar measure on ~). If to each coordinate space { - 1, + 1} we give uniform probability distribution, then h is the product of these measures. With respect to the measure h, the coordinate functions Xk, k E Z, correspond to the Rademacher functions. The finite products XhX~2 .--X~, il < i2 < " " < ik, which is the collection of non-trivial continuous characters of f2, correspond to the Walsh functions. Let fk, k E 7/, be an enumeration of Walsh functions Wi~,i2,...,ik on [0, 1]. Write: O(x) = {fk(X)}keZ, x E [0, 1] .
The unit interval is mapped by ~ in a one-to-one Borel manner into t]. Let #w(A) = 2 o ~-I(A), A E ~ , where 2 denotes the Lebesgue measure on [0, 1]. The coordinate functions X~, k E 7/, are pairwise independent and span 5~2(), ~ , #w). This gives the affirmative answer to the first question. We shall call #w the measure induced by an enumeration of Walsh functions. It was pointed out by F. Parreau to one of us that such a #w is not invariant under the shift. In the sequel we will give a description of #w, from which this will follow. 7.5. The second question remains unsolved. A positive answer to it will solve the problem of the simple Lebesgue spectrum affirmatively. At this point we mention that an example (or rather a family of examples) of mixing rank one transformation, due to D. Ornstein (1970), allows us to construct a strictly stationary processes {fk}k~Z such that fh fkfo dm --+ 0 as k ~ oc, while {fk; k E Z} span L2(f2, o~,m). Ornstein's example is deep and has not so far been modified or improved to yield a transformation with simple Lebesgue spectrum.
529
7.6. Let ~)0 be the subset of ~) consisting of those co c ~), which have only finitely many -1; it is a countable dense subgroup of ~. The action of f)0 on ~), co ---+ coco0, co E ~), coo c ~)0, is uniquely ergodic, the Haar measure h being the only probability measure invariant under the f20 action. Other product measures are quasi-invariant and ergodic under this action and there are many other measures with respect to which this action is non-singular and ergodic. The theorem below shows that all these measures are singular to any measure/~ for which the coordinate functions are orthogonal and span L~(~), ~, #). 7.7. THEOREM. If /~ is a probability measure on f) such that the coordinate functions are pairwise independent and span L02(~),~, #), then there is a Borel set E which supports # and which is wandering under the ~0 action, i.e., cooE, coo E (2o, are pairwise disjoint. In the case when # is given by an enumeration of Walsh functions, g is the Haar measure on a closed subgroup of 0. PROOF. We begin with an observation of Robertson (1988). If the coordinate functions Xk, k E 77, are pairwise independent, then they span L~(Q, .~, it) if and only if for all i,j,
= k=-oo k = - oo
Ck
= 1 .
Further
c~J = .L x~x#k d~
~0
if any one of i,j, k, tends to +oo. (Note that X~Xj is of absolute value one, hence its L2-norm is l, and the sum over k of the squares f ckj is one.) V - ~ OO l J "~T The sum 2_,k=-~ Ck ak, converges in L2, hence (by diagonal method) there exists an increasing sequence N1, l E ~, of natural numbers such that for all i,j, N/ XiXj(co) = hm ~-~ c i,jArk co)
l_,ook_,=__~ ' k
for almost all co E ~) with respect to ,u. Let

Nl ..
Eid=
co:X/Xj(co)= lira Z
l ~ o o k=-Nt
c~lXk(co)~
J
E=
N
oc<i,j<oo
Eu'
which is a support of #. Let coo E ~0 with - 1 at places il, i2,..., ip and +1 at the remaining places. Take an i : {il,i2,... ,ip} and j E {il,i2,... ,ip}, then for all co6E
530
XiXj((D)
"" = c i i j Xi,((D)+...+ci.'}Xip((D)+
lim
c)JXk((D) ,
l ~ oc - N~<"~<Nt
where ~ indicates that the terms c~ id X /1, ' ' ' , c,id X,are deleted f r o m the sum. ~1 ~p p Assume that for some (Do E f20, (DoE n E 0. Then there exists co = {(D~}kcz ~ E, such that (D0(D E E. We have
x,((D) = (D;, x,((D0(D) = X:((D) = (D:, ::((D0(D) = - ( D : ,
so that
i,j (Di(Dj = Ci I ('Oil ~ - ' ' " ij ~- Ci; (Dip -~
lim ~
V ~
l-+ c~
" c2J(Dk,
"
-Nl<k<_Nl +
id --(Di(Dj = --Ci 1 (Dil . . . . .
ij Cip
(Dip +
l ---+ oo
lim
z_.,
c~
ij
(Dk,
-N1 <k<Ni
whence,
(Di(DJ i j c% + ... + ci~ i j (Dip = cil
This holds for all i ~ {il, i2, , ip}. Letting i > oc, since ci~iJ,".., Cipij > 0, the right h a n d side o f the above equality tends to zero, while the left h a n d side remains one in absolute value. The contradiction shows that (D0(D ~ E, whence co0E N E = 0. Suppose now that #w is obtained by an enumeration o f Walsh functions. In this case X/Xj, i T~j, is some )(1. Write l = 9(i,j). Then
x,x: = xg(i,;i ,
so that in the expansion
x/x:: Z c2"xk
k
all ci~ j = 0 except for k = g(i,j), in which case c2j = 1. N o w Ei,j -= {co : coi(Dj = c%(i,j)}. The sets El, are closed subgroups o f {2. The same is true for the set E = ('l-~<i,j<o~ Eid. The characters X,-1Xi 2 ... X/~ o f {2 are also the characters o f E, but they need n o t be distinct characters. In particular X/X: and Xg(ij) agree on E. Further XhX~2 ... X~. k is either equal to some Xp or equal to one. We have fE Xi~Xi2 "" "X;k dpw equal to 0 in the first case and equal to 1 in the second, which also holds if #w is replaced by the normalised H a a r measure on E. Thus #w is the normalised H a a r measure on E and the theorem is proved.
Acknowledgements
It is a pleasure to thank F. Parreau whose suggestions considerably improved the presentation and the results o f the paper. A part o f this work is jointly with R. C.
531
C o w s i k a n d i n c l u d e d with his permission. This w o r k was d o n e while the first a u t h o r was a Visiting F e l l o w at the Centre o f A d v a n c e d S t u d y in M a t h e m a t i c s , U n i v e r s i t y o f M u m b a i . H e w o u l d like to express his sincere t h a n k s for the hosp i t a l i t y received d u r i n g his stay.
References
Ornstein, D. S. (1970). On the root problem in ergodic theory. Proc. Sixth Berkeley Symposium in Mathematical Statistics and Probability, vol. 2, pp. 347 356. University of California Press. Robertson, J. B. (1988). A two state pairwise independent stationary process for which Xa,X3,X5 are dependent. Sankhya. The Indian J. Statist. Ser. A 50, 171-183.
11. /
| "7
Gaussian Processes: Inequalities, Small Ball Probabilities and Applications
W. V. L i a n d Q . - M . S h a o
1. Introduction
A Gaussian measure # on a real separable Banach space E equipped with its Borel o--field ~ and with norm ]l-II is a Borel probability measure on (E, N) such that the law of each continuous linear functional on E is Gaussian (normal distribution). The small ball probability (or small deviation) for the Gaussian measure studies the behaviour of
log#(x : Ilxll
(1,1)
as e ---, 0, while the large deviation for the Gaussian measure studies the behaviour of l o g s ( x : Ilxll >_ a) as a ~ c~. It is well-known that the large deviation result plays a fundamental role in studying the upper limits of Gaussian processes, such as the Strassen type law of the iterated logarithm. The theory on large deviation has been well developed during the last few decades; see, for example, Ledoux and Talagrand (1991), Ledoux (1996) and Bogachev (1998) for Gaussian measures, Varadhan (1984) and Dembo and Zeitouni (1998) for the general theory of large deviations. However, the complexity of the small ball estimate is well-known, and there are only a few Gaussian measures for which the small ball probability can be determined completely. The small ball probability is a key step in studying the lower limits of the Gaussian process. It has been found that the small ball estimate has close connections with various approximation quantities of compact sets and operators, and has a variety of applications in studies of Hausdorff dimensions, rate of convergence in Strassen's law of the iterated logarithm, and empirical processes, just mentioning a few here. Our aim in writing this exposition is to survey recent developments in the theory of Gaussian processes. In particular, our focus is on inequalities, small 533
534
W . V. L i a n d Q . - M . S h a o
ball probabilities and their wide range of applications. The compromise attempted here is to provide a reasonable detailed view of the ideas and results that have already gained a firm hold, to make the treatment as unified as possible, and sacrifice some of the details that do not fit the scheme or tend to inflate the survey beyond reasonable limits. The price to pay is that such a selection is inevitably biased. The topics selected in this survey are not exhaustive and actually only reflect the tastes and interests of the authors. We also include a number of new results and simpler proofs, in particular in Section 4. The survey is the first to systematically study the existing techniques and applications which are spread over various areas. We hope that readers can use results summarized here in their own works and contribute to this exciting area of research. We must say that we omitted a great deal of small ball problems for other important processes such as M a r k o v processes (in particular stable processes, diffusions with scaling), polygonal processes from partial sums, etc. Probably the most general formulation of small ball problems is the following. Let E be a Polish space (i.e., a comPlete, separable metric space) and suppose that {#8 : s > 0} is a family of probability measures on E with the properties that #e ~ # as a ~ 0, i.e., /& tends weakly to the measure #. If, for some bounded, convex set A C E, we have #(A) > 0 and p~(eA) ~ 0 as e ---, 0, then one can reasonable say that, as s -~ 0, the measures #8 "see" the small event eA. What is often an important and interesting problem is the determination of just how "small" the event eA is. That is, one wants to know the rate at which #~(aA) is tending to 0. In general, a detailed answer to this question is seldom available in the infinite dimensional setting. However, if one only asks about the exponential rate, the rate at which log#~(eA) ~ 0, then one has a much better chance of finding a solution and one is studying the small ball probabilities of the family {#8 : e > 0} associated with the ball-like set A. In the case where all measures #~ are the same and #~ = #, A = {x E E : Ilxll _< 1}, then we are in the setting of
(1.1).
When we compare the above formulation with the general theory of large deviations, see page one of Deuschel and Stroock (1989) for example, it is clear that the small ball probability deals with a sequence of measures below the nontrivial limiting measure # and the large deviation is above. The following well known example helps to see the difference. Let Xi, i _> 1, be i.i.d, random variables with ~ X i = 0 , EJf/2=I and ~ e x p ( t 0 l X l [ ) < e o for t 0 > 0 , and let n X ,i. Then as n ~ oc and x~ ~ ec with xn ~- o(v/~ ) Sn = }-~i=l ~<i<nmax ISil >- x~
~ - ~x~
and as n ~ oc and sn - , 0, x/-fie~ ~ oc

max ISil < ~ l <_i<_n --
~ 8
~2 2
"
Gaussian processes. Inequalities, small ball probabilities and applications
535
That is why the small ball probability is sometimes called small deviation. We make no distinction between them. Of course, certain problems can be viewed from both points of view. In particular, the large deviation theory of Donsker and Varadhan for the occupation measure can be used to obtain small ball probabilities when the Markov processes and the norm used have the scaling property. A tip of the iceberg can be seen in Sections 3.3, 7.10 and below. The small ball probability can also be seen in many other topics. To see how various topics are related to small ball estimates, it is instructive to exam the Brownian motion on R1 under the sup-norm. We have by scaling
W(t)
p( sup lW(t)l ~ ~) = p(\ o 1}
is the first exit (or passage) time and
L(T,y) =
l f 0 r l(_~,yi(Ws)ds ~
is a distribution (occupation) function with a density function called local time. In (1.2), the first expression is clearly the small ball probability for the Wiener measure or small deviation from the '"flat" trajectories or lower tail estimate for the positive random variable sup0<t< 1 [W(t)[; the second and third expressions are related to the two sided boundary crossing probability and exit or escape time. In (1.3), the second expression can be viewed as a very special case of the asymptotic theory developed by Donsker and Varadhan. The value 7r2/8 is the principle eigenvalue of the Laplacian over the domain [-1, 1]. We believe that a theory of small ball probabilities should be developed. The topics we cover here for Gaussian processes are part of the general theory. The organization of this paper is as follows. Section 2 summarizes various inequalities for Gaussian measures or for Gaussian random variables. The emphasis is on comparison inequalities and correlation inequalities which play an important role in small ball estimates. In Section 3 we present small ball probabilities in the general setting. The links with metric entropy and Laplacian transforms are elaborated. Sections 4 and 5 pay special attention to Gaussian processes with index set in ~ and Rd respectively. In Section 6, we give exact values of small ball constants for certain special processes. Various applications are discussed in Section 7.
536
W. V. Li and Q.-M. Shao
2. Inequalities for Gaussian random elements Inequalities ,are always one of the most important parts of a general theory. In this section, we present some fundamental inequalities for Gaussian measures or Gaussian random variables. The density and distribution function of the standard Gaussian (normal) distribution on the real line N are q~(x)=(2~) l exp{-x2/2} and ~b(x)=
CO
qS(t)dt .
(2.1)
Let 7~ denote the canonical Gaussian measure on R~ with density function ~b~(x) = (2~) -~/2 exp(-lxl2/2) with respect to Lebesgue measure, where Ixl is the Euclidean norm o f x E Nn. We use # to denote a centered Gaussian measure throughout. All results for 7~ on ~ in this paper can be used to determine the appropriate infinite dimensional analogue by a classic approximation argument presented in detail Ledoux (1996, Chapter 4).
2.1. lsoperimetric inequalities
The following isoperimetric inequality is one of the most important properties of the Gaussian measure. It has played a fundamental role in topics such as integrability and upper tail behavior of Gaussian seminorms, deviation and regularity of Gaussian sample paths, and small ball probabilities. THEOREM 2.1. For any Borel set A in Nn and a half space H = {x E Rn: (x, u) _< a} such that 7n(A) _> 7n(H) = q~(a) for some real number a and some unit vector u C R ~, we have for every r _> 0
7,(A + rU) >_ 7 , ( H + rU) = q~(a + r) ,
where U is the unit ball in N" and A + rU = {a + ru : a E A, u E U}. The result is due independently to Borell (1975) and Sudakov and Tsirelson (1974). The standard proof is based on the classic isoperimetric inequality on the sphere and the fact that the standard Gaussian distribution on N~ can be approximated by marginal distributions of uniform laws on spheres in much higher dimensions. The approximation procedure, so called Poincare limit, can be found in Ledoux (1996), Chapter 1. A direct proof based on the powerful Gaussian symmetrization techniques is given by Ehrhard (1983). This also led him to a rather complete isoperimetric calculus in Gauss space [see Ehrhard (1984, 1986)]. In particular, he obtained the following remarkable Brunn-Minkowski type inequality with both sets A and B convex.
Gaussian processes." Inequalities, small ball probabilities and applications
537
THEOREM 2.2. (Ehrhard's inequality) F o r any convex set A and Borel set B of IRn, and 0 < 2 < 1, 1 o 7n()d + (1 - 2)B) k 2 ~ 1 o 7.(A) + (1 - 2 ) ~ -1 o #(B) where 2A + (1 - 2)B = {2a + (1 - 2 ) b : a E A,b E B}. The above case of one convex set and one Borel set is due to Latala (1996). A special case was studied in Kuelbs and Li (1995). E h r h a r d ' s inequality is a delicate result, which implies the isoperimetric inequality for G a u s s i a n measures and has some other interesting consequences as well; [see for example E h r h a r d (1984), Kwapiefl (1994) and Kwapiefl (1993)]. It is still an open p r o b l e m to prove (2.2) for two arbitrary Borel sets and the result in R suffices to settle the conjecture. I f the conjecture were true, it would i m p r o v e u p o n the m o r e classical so called log-concavity of G a u s s i a n measures: log#(2A + (1 - 2)B) > 21og#(A) + (1 - 2 ) l o g # ( B ) . (2.3) (2.2)
A p r o o f of (2.3) m a y be given using again the Poincare limit on the classical B r u n ~ M i n k o w s k i inequality on R"; see Ledoux (1991) for details. T a l a g r a n d (1992, 1993) has provided very sharp u p p e r and lower estimates for 7~ (A + rU) when r is large and A is convex symmetric. In particular, the estimates relate to the small ball p r o b l e m and its link with metric entropy; see Section 7.3 for some consequences. Other than using addition of sets as enlargement, multiplication to a set can also be considered. The following result is due to L a n d a u and Shepp (1970). THEOREM 2.3. F o r any convex set A in R n and a half space H = {x E Rn : (x, u) _< a} such that 7n(A) _> 7~(H) = ~b(a) for some a _> 0 and some unit vector u E R n, one has for every r _> 1
yn(rA) >_ 7~(rH) = (l)(ra) ,

where rA = {rx : x E A}. The p r o o f is based on the B r u n n - M i n k o w s k i inequality on the sphere without using the Poincare limit. A n application of T h e o r e m 2.3 is the exponential square integrability of the n o r m of a G a u s s i a n measure. F o r a symmetric convex set A, the following was conjectured by Shepp in 1969 (so called S-conjecture) and proved recently by Latata and Oleszkiewicz (1999). Early w o r k and related p r o b l e m s can be found in Kwapiefl (1993). THEOREM 2.4. Let # be a centered G a u s s i a n measure on a separable Banach space E. I f A is a symmetric, convex, closed subset of E and S c E is a symmetric strip, i.e., S = {x E E : Ix* (x)[ _< 1} for some x* E E*, the dual space of E, such that #(A) = #(S), then
#(tA) > # ( t S )
for
t> 1
538
and
#(tA)<_#(tS)
for
0<t<
1 .
The proof uses both Theorems 2.1 and 2.2. A consequence of Theorem 2.4 is the following result, which gives the best constants in comparison of moments of Gaussian vectors. THEOREM 2.5. If ~i are independent standard normal r.v. and xi are vectors in some separable Banach space (E, IIll) such that the series S = ~ x i i is a.s. convergent, then
(EIISII ) vp _<
ap
I
( llSll
q 1/q
for a n y p > q > 0, where ap = (EI#IIP)vp = v~(~ V2r((p+ 1)/2)) 1/p
2.2. Concentration and deviation inequalities

Here we only summarize some of the key estimates. We refer to Ledoux and Talagrand (1991), Ledoux (1996) and Lifshits (1995) for more details and applications. Let f be Lipschitz function on R n with Lipschitz norm given by [Ifllup = sup{If(x) - f(y)l/[ x - y [ : x,y E ~ ' } . Denote further by My a median of f for # and by Hzf = f f d#(x) for the expectation o f f .
THEOREM 2.6.
# ( I f - My[ > t) <_ exp{-t2/2llfll2ip} and # ( I f - ~f] > t) <_ 2exp{-t2/2llfll[ip}
(2.4)
(2.5)
Another version of the above result can be stated as follows. Let {Xt, t E T} be a centered Gaussian process with
d(s,t) = (EIX,-Xtl2) v2,

and a 2 = supf~r ~Xt2.
s,t ~ T
THEOREM 2.7. For all x > 0, we have P supXt-EsupN_>x

k tcT tcT
_<exp -~aSa 2
A proof based on log-concavity and a connection to Wills functional are given in Vitale (1996, 1999b).
Gaussian processes: Inequalities, small ball probabilities and applications
539
THEOREM 2.8. Let N(T, d; ~) denote the minimal number of open balls of radius e for the metric d that are necessary to cover T. Then
tz supXt>_x+6.5
\ tET dO
~/2(logN(T,d;e))l/2d~
_<exp -~-~a 2
The above result is due to Dudley (1967). Among Fernique type upper bounds, the following inequality due to Berman (1985) gives a sharp bound. THEOREM 2.9. Let {Xt, t E T}, T C Nd be a centered Gaussian process. Let
p(e) =
and
sup
s,tET,Is--tl<~
d(s, t)
Q(6) =
fo
p(6e-Y-)dy .
Then for all x > 0 P supXt>x

\ tET
< C ( Q 1(l/x)) dexp --~a2
where Q-1 is the inverse function of Q and C is an absolute constant.
2.3. Comparison inequalities

Ledoux and Talagrand (1991) and Lifshits (1995) have a very nice discussion on comparison inequalities for Gaussian random variables. We list below several main results, which are also useful in the small ball problems [see Li and Shao (1999a), Li (1999b)]. In this subsection, we l e t X = 0(1,... ,Xn) and Y = (Y1,..., Y,) be independent centered Gaussian random variables. The following identity due to Piterbarg (1982) gives a basis for various modifications of comparison inequalities. THEOREM 2.10. Let f : E n ~ R I be a function with bounded second derivatives. Then
~_f(X) - F_f(Y) =
Z
1<i,j<n
(~Xj
F_Y~Yj)
E 8xi~xj 82f ((1 - e)l/ZY + el/Zy)de .

From the above identity, one can easily derive the famous Slepian lemma given in Slepian (1962).
540
THEOREM 2.11. (Slepian's lemma) If ~Y,,}= I-Y//2 and ~X, iXj_< WY/Yj. for all
i , j = 1 , 2 , . . . , n , then for anyx,

P (maxX/ _< x] _< P ( m a x Y,- _< x~ .
ki<_i<n /
\l<i<n /
Other interesting and useful extensions of Slepian's inequality, involving rainmax, etc, can be found in Gordon (1985). The next result is due to Fernique (1975) and requires no condition on the diagonal. Some elaborated variants are given in Vitale (1999c). THEOREM 2.12. If [7(](/-Xj) 2 _> I:(Y//- yj)2 then E max X/_> E max Y,.
1<i<n 1<i<n
for
1 _< i,j <_ n
and Ff(max(X~ - X j ) ) > El(max(Y,. - Yj))

zd zd
for every non-negative convex increasing function f on ~+. We end this subsection with Anderson's inequality given in Anderson (1955), while the second inequality below is due to Shao (1999). THEOREM 2.13. Let 2x and Xy be the covariance matrices of X and Y, respectively. If Nx - Xy is positive semi-definite, then for any a C Rn, any convex symmetric set C in N", and any arbitrary set A in R n P(X C) _< P(Y ~ C), (det(Zy)'~ ,/2 P(X A) >_ \ d e t ( X x ) J P(Y A) and
p ( x + ar c) _< p ( x c) ,
is a monotone decreasing function of r, 0 < r < 1.
2.4. Correlation inequalities

The Gaussian correlation conjecture states that for any two symmetric convex sets A and B in a separable Banach space E and for any centered Gaussian measure/~ on E,
#(A N B) >>I~(A)#(B) .
(2.6)
541
For early history of the conjecture we refer to Das Gupta et al. (1972), Tong (1980) and Schechtman et al. (1998). An equivalent formulation of the conjecture is as follows: If (XI,... ,Xn) is a centered, Gaussian random vector, then P(maxIX/] _< 1) _> P(maxlX/]_ P(IXll _< x ) p ( m a x IX/[ _ < x ) .
\1 <i<_n \2<i<n
(2.8)
The Khatri-Sidak inequality has become one of the most powerful tools for lower bound estimates of small ball probabilities; see Section 3.4. The inequality (2.8) was extended to elliptically contoured distributions in Das Gupta et al. (1972). The original proofs of Khatri and Sidak are very lengthy. Simpler proofs are given in Jogdeo (1970) and Schechtman et al. (1998). Here we give an alternative proof. We only need to show that for any symmetric and convex set A in ~n-1, P(IXI[ ~ x, (X2,... ,Xn) E d)/P(lXl[ <_x) := f(x)/g(x) is a monotone decreasing function of x, x > 0. Let 4(xl,x2,... ,x,) be the joint density function of X~,X2,...,X~, and q51(x) be the density function of X1. It suffices to show that
g x >_ O,
9(x)f'(x) - f(x)g'(x) < 0
(2.9)
Let y = (x2,... ,x~) and Y = (X2,... ,Am). Noting that
f'(x) = 2 [ f(x,y)dy = 2q~l(X)P(Y E AIX1 = x) Jr cA

and g'(x) = 2q~l(x), we have
(2.10)
9 (x)f' (x) - f (x) 9' (x) 2q~l(x)(P(iX1 [ _< x)P(Y EAIX1 = x ) - P(IXI[ <_x,Y EA)) = 2~I(x)P(IX~ I <_ x)(P(Y EAIX~ = x ) - P(Y EA[ IX1[ _<x)) <0
by Anderson's inequality, as desired. It is also known that the Gaussian correlation conjecture is true for other special cases. Pitt (1977) showed that (2.7) holds for n = 4 and k = 2. The recent paper Schechtman et al. (1998) sheds new light on the conjecture. They show that the conjecture is true whenever the sets are symmetric and ellipsoid or the sets are
542
W.V. Li and Q.-M. Shao
not too large. Harg6 (1998) proves that (2.6) holds if one set is symmetric ellipsoid and the other is simply symmetric convex. Vitale (1999) proves that (2.6) holds for two classes of sets: Schur cylinders and barycentrically ordered sets. Shao (1999) shows that P ( m a x ]X~] < 1)_> 2 -min(k'n-k) P \ l <i<n \~<i<k
(max I~1
1)P(
\k+l<_i<_n
max
1).
(2.11)
Recently, Li (1999) presented a weak form of the correlation conjecture, which is a useful tool to prove the existence of small ball constants; see Section 3.3. The varying parameter 2 plays a fundamental role in most of the applications we know so far. THEOREM 2.14. Let # be a centered Gaussian measure on a separable Banach space E. Then for any 0 < 2 < 1, any symmetric, convex sets A and B in E g(A fi B)#()~2A + (1 - 22)B) > kt(;~A)#((1 - J~2)l/2B) . In particular,
,u(A f i B ) > ,u(,,;kd)#((1 - J~2)l/2B) (2.12)
and P(X E A, Y E B) > P(X E )~A)P(Y C (1 - ,~2)1/2B) for any centered jointly Gaussian vectors X and Y in E. The proof follows along the arguments of Proposition 3 in Schechtman et al. (1998), where the case 2 = l/v/2 was proved. Here we present a simple proof for (2.13) given in Li and Shao (1999b). Let a = (1 and (X*, Y*) be an independent copy of (X, Y). It is easy to see that X - aX* and Y + Y*/a are independent. Thus, by Anderson's inequality (2.13)
-,~2)1/2/)~,
P ( X C A, Y E B) >_ P ( X - aX* E A, Y + Y* /a E B)
= P ( X - aX* E A ) P ( Y + Y*/a E B) = P(X C 2A)P(Y E (1 - 2)1/2B)

,
as desired. The main difference between the Khatri-Sidak inequality and Theorem 2.14 in the applications to small ball probabilities is that the former only provides the rate (up to a constant) and the latter can preserve the rate together with the constant. For various other approaches related to the Gaussian correlation conjecture, [see Hu (1997), Hitczenko et al. (1998), Szarek and Werner (1999), Lewis and Pritchard (1999).
543
3. Small ball probabilities in general setting

In this section, we present some fundamental results in the general setting for the small ball probabilities of Gaussian processes and Gaussian measures. Throughout, we use the following notations. Let E* be the topological dual of E with norm I1' ][ and X be a centered E-valued Gaussian random vector with law # = Y(X). It is well known that there is a unique Hilbert space Hv C_E (also called the reproducing Hilbert space generated by #) such that/~ is determined by considering the pair (E, Hu) as an abstract Wiener space [see Gross (1970)]. The Hilbert space Hu can be described as the completion of the range of the mapping S : E* --+ E defined by the Bochner integral
Sf = ~xf(x)d#(x)
fEE*
where the completion is in the inner product norm
(Sf , Sg)~ = fEf(x)g(x)d#(x ) f,g E E*

We use ]].]1~ to denote the inner product norm induced on H,, and for well known properties and various relationships between #, H~, and E, see Lemma 2.1 in Kuelbs (1976). One of the most important facts is that the unit ball = {x c Ilxll _< 1} of is always compact. Finally, in order to compare the asymptotic rates, we write f(x) ~ g(x) as x -+ a if limsupx+af(x)/g(x ) < oc, and f(x) ~ g(x) as x -~ a if f ( x ) -< g(x) and g(x) _~ f(x).
3.1. Measure of shifted small balls

We first recall Anderson's inequality given in Theorem 2.13 that plays an important role in the estimate of small ball probability. For every convex symmetric subset A of E and every x E E, #(A + x ) < #(A) . (3.1)
Note that (3.1) is also an easy consequence of the log-concavity of Gaussian measure given in (2.3) by replacing A with A + x and B with A - x and taking 2 = 1/2. Next we have the following well known facts about the shift of symmetric convex sets [see, for example, Dudley et al. (1979), de Acosta (1983)]. THEOREM 3.1. For any f E H~ and r > 0,
exp{-llfll2J2}#(x: Ilxll _< r) < ~(x: IIx--fll <--r) < kt(x: Ilxl[ _< r) .
(3.2)
544
W.V.
Li and Q.-M. Shao
Furthermore,
~(x: IIx-fll -< ~) N exp{-llfll2j2}. #(x: Ilxll _< ~) as e --+ 0 .
(3.3)
The upper bound follows from Anderson's inequality (3.1). The lower bound follows from the Cameron-Martin formula #(A-f) =
fexp{-gHII~-(x,f)~
1
f 2
}d#(x)
(3.4)
for Borel subsets A of E, f C H~, together with H61der's inequality and the symmetry of (x,f), on A = {x : Ilxll _< r}. Note that {x,f)u can be defined as the stochastic inner product for # almost all x in E. A particularly nice proof of (3.4) is contained in Proposition 2.1 of de Acosta (1983). Refinements of (3.2) are the following inequalities which play important roles in the studies of the functional form of Chung's law of the iterated logarithm; see Section 7.4 for more details. They extend the approach of Grill (1991) for Brownian motion, and are given in Kuelbs et al. (1994) and Kuelbs et al. (1994). First we need some additional notations. Let
I(x) = { Ilxll2/2 x c H~
+ec otherwise , which is the I-function of large deviations for #. Furthermore, defining
I(f,r)=
[[f-x[l<r
inf I(x) ,
we see I ( f , r) < oc for all f E/?~, the support of # in E. It is also the case that all of the properties established for the function I(x, r) in Lemma 1 of Grill (1991), when # is Wiener measure on C0[0, 1], have analogues for general p. In particular, if f E E and r > 0, then there is a unique element, call it hf,r, such that ][hf,r - fi[ __<r and I ( f , r) = l(hf,r). The following result is given in Kuelbs et al. (1994). THEOREM 3.2. For all f E Hu, r > 0, and h
=
hf,g,r
/4x: Ilx - f l l -< r) _< exp { - sup ((6 - 1)3 l(f,h)u

k ~>0
+ (2 - 6),~-~1(h))}#(x: Ilxlt _< r) ,

and f o r O < 6 < 1 -< r) > exp{-Z(h)}~(x: Itxll < (1 - 6)r) .
~(x: II x-fll
In particular, for all f H~,
~(x: IIx -fll <_r) < exp{-I(hf,r)}It(x:
Ilxll _< r)
545
and for all f E H~, e x p { - I ( f ) } . #(x': Ibxll _< r) _<
IIx - f l l < r) Ilxlt -< r) .
<<
We thus see that the small ball probabilities of shifted balls can be handled by (3.3) i f f E H s and by the above Theorem i f f ~ H s. Note that the estimates we have in this section can be used to give the convergence rate and constant in the functional form of Chung's LIL; see Section 7.4, which only depends on the shift being in H s. So we can also answer the similar problem for points outside H~ by Theorem 3.2, Other related shift inequalities for Gaussian measures are presented in Kuelbs and Li (1998).
3.2. Precise links with metric entropy

Let/x denote a centered Gaussian measure on a real separable Banach space E with norm II"[I and dual E*. Consider the small ball probability 0(5) - - log g(x: I/xll _< ~) (3.5)
as e ---+0. The complexity of 0(e) is well known, and there are only a few Gaussian measures for which 0(e) has been determined completely as e --+ 0. Kuelbs and Li (1993a) discovered a precise link between the function 0(e) and the metric entropy of the unit ball K s of the Hilbert space H s generated by #. We recall first that if (E, d) is any metric space and A is a compact subset of (E, d), then the d-metric entropy of A is denoted by H(A, 5) = logN(A, 5) where N(A, ~) is the minimum covering number defined by
N(A,e) = min{n > 1 : 3 x l , . . . , x ~ EA} such that U~_IB~(xj) D_A ,

where B~(a) = {x E A : d(x, a) < 5} is the open ball of radius ~ centered at a. Since the unit ball Ke = {x E H s : ]]xUs _< 1} of H s is always compact, K s has finite metric entropy. Now we can state the precise links between the small ball function 0(e) given in (3.5) and the metric entropy function H(K,, ~).
THEOREM 3.3. Let f ( x ) and g(x) be regularly varying functions at 0, and J ( x ) be a
slowly varying function at infinity such that J(x) ,~ J(x p) as x --+ oc for each p>0. (I) We have H(Ks, e / 2 ~ ) ~ 0(2e). In particular, if 0(~) -~ 0(28) and 0(e) -~ e-~J(8 1), where ~ > 0, then
H(Ks, e) ~ g-2~/(2+~)J(1/g)2/(2+~)
Especially, (3.6) holds whenever 0(e) ,-~ e-~J(e -1).
(3.6)
546
W.V. Li
and Q.-M. Shao
(II) If ~b(e) _~f(e), then H ( K ~ , e / f x / ~ ) ~ f(e). e-~j(e 1) with c~> 0, then

H(K#,g) ~ ~ 2c~/(2+~)J(1/g)2/(2+ez) .
In particular, if f(e) =
(III) If H(K~,e)>_9(e), then q S ( e ) > - 9 ( e / X / ~ g-~J(1/e), where 0 < e < 2, then
). In particular, if 9 ( e ) =
q~(g) ~ g-2~/(2-~)(j(1/,9.))2/(2
(IV) If H(Ku, e)
~)
~_ e-~J(1/e), 0 < c~< 2, then for e small

.
~b(g) -< g-2~/(2 e)(j(l/g))2/(2-00
As a simple consequence, it is easy to see that for e > 0 and fl E , qS(e) ~ e-~(log l/e) ~ iff
H(K~,e) ,.~ e-2~/(z+~)(log l/e) 2~/(2+~) .
(3.7)
To fully understand this basic result, we would like to make the following remarks. First, since it is known from Goodman (1990) that H(K~, e) = o(g -2) regardless of the Gaussian measure #, the restriction on a in part (III) and (IV) of the theorem is natural. Second, we see clearly from (I) and (II) of Theorem 3.3 that in almost all cases of interest, small ball probabilities provide sharp estimates on the metric entropy. This approach has been applied successfully to various problems on estimating metric entropy; see Section 7.6 for more details. Third, the proofs of (I), (II) and (III) given essentially in Kuelbs and Li (1993a) are based on the relations H(2e, 2K~) _< 22/2 - log/~(B~(0)) and H(e, 2K~) + log #(B2~(0)) _> log ~b(2 + r/~) (3.9) (3.8)
for all 2 > 0 and e > 0, where ~b(t) = (2~) -1/2 ft_~ exp{_uZ/Z}du and ~(,/~) = #(B~(0)). In fact, (3.8) follows easily from (3.2), and (3.9) is a consequence of the isoperimetric inequality for Gaussian measures which states that l~(A + 2K~) > 4(2 + t/) for any 2 > 0 and any Borel set A with #(A) _> ~b(t/); see Theorem 2.1. The proof of (IV) of Theorem 3.3 given by Li and Linde (1999) is based on (3.9) with an iteration procedure, and a new connection between small ball probabilities and the/-approximation numbers given in Section 3.5. Fourth, the recent establishment of (IV) in Theorem 3.3 allows applications of powerful tools and deep results from functional analysis to estimate the small ball probabilities. The following is a very special case of Theorem 5.2 in Li and Linde (1999), which is a simple consequence of (IV) in Theorem 3.3 for linear transformations of a given Gaussian process. It is worthwhile to stress that the result below, along with many other consequences of (IV) in Theorem 3.3, has no purely probabilistic proof to date.
547
THEOREM 3.4. Let Y = (Y(t))tE[oj] be a centered Gaussian process with continuous sample path and assume that logP for ~ > 0. If sup IY(t) l < e >-_-g-~ log
\0_<t_<1
X(t)
~01 K(t,s)Y(s)ds
(3.10)
with the kernel
K(t, s)
satisfying the H61der condition

{I 2 ,
fo I IK(t,s)-K(t',s)lds < cltfor some 2 E (0, 1] and some c > 0, then logP sup IX(t) I < e
t,{ E
[0, 1] ,
(3.11)
\0<t<l
> - e -~/(~a+l) log
Some applications of Theorem 3.4 for integrated Gaussian processes are detailed in Sections 4.4 and 6.3. Note that the integrated kernel K(t, s) = 1(0,0(s) satisfies the H61der condition (3.11) with 2 = 1. So if Y(t) in Theorem 3.4 is a fractional Brownian motion (see Section 4,3) and X(t) in Theorem 3.4 is the integrated fractional Brownian motion (see Section 4.4), then the lower bound given in Theorem 3.4 is sharp by observing (4.14) and (4.15). Other significant applications of (IV) in Theorem 3.3 are mentioned in Section 5.2 on Brownian sheets. Finally, in the theory of small ball estimates for Gaussian measure, (IV) of Theorem 3.3 together with basic techniques of estimating entropy numbers as demonstrated in Li and Linde (1999), is one of the most general and powerful among all the existing methods of estimating the small ball lower bound. Another commonly used general lower bound estimate on supremum of Gaussian processes is presented in Sections 3.4 and 4.1.
3.3. Exponential Tauberian theorem

Let V be a positive random variable. Then the following exponential Tauberian theorem connects the small ball type behavior of V near zero with an asymptotic Laplace transform of the random variable V. THEOREM 3.5. For c~ > 0 and fi E logP(V<_~)~-Cve ~IlogeI ~ a s ~ - + o +
548 if and only if
W. l~ Li and Q.-M. Shao
log exp(-2V) ~ - ( 1 + o~)o~-c~/(l+@c1/(l+:) 2~/(1+~)(log 2) p/(I+~) as 2 --+ oc .
A slightly more general formulation of the above result is given in Theorem 4.12.9 of Bingham et al. (1987), and is called de Bruijn's exponential Tauberian theorem. Note that one direction between the two quantities is easy and follows from P(V _< ~) = P ( - 2 V > -2e) _< exp(2e)Eexp(-2V)
,
which is just Chebyshev's inequality. Next we give two typical applications. Let Xo(t) = W(t) and
Xm(t)=
/0' o f0'
l(s)ds,
t>O,
m_> 1 ,
which is the ruth integrated Brownian motion or the m-fold primitive. Note that using integration by parts we also have the representation
Xm(t) = ~nv. ( t - s)mdW(s), m > O .
(3.12)
The exact Laplace transform ~ e x p ( - 2 f~ X~(t)dt) is computed in Chen and Li (1999) and one can find from the exact Laplace transform, for each integer m > 0, lim 2 -I/(2m+2) log ~exp - 2
2---+o~
(t)dt (3.13)
=-2
(2m+l)/(2m+2) s i n ~
Then by the Tauberian theorem, (3.13) implies

logP(flx2(t)dt < ~2)
~ 2-1(2m + 1)((2m + 2 )Sln~_~) " ~
-(2m+2)/(2m+l) e-2/(2m+1)
. (3.14)
For other applications of this type going the other way, see Section 7.10. Our second application is for sums of independent random variables, and it is an easy consequence of the Tauberian theorem. COROLLARY 3.1. If V/, 1 < i < m, are independent nonnegative random variables such that lime~logP(V/<e)=-d/,
e---+0
1 <i<m
549
for 0 < 7 < oe, then lims~logP

840
V/_< s
= i=1
di}/(1+~)
N o w we consider the sum of two centered Gaussian r a n d o m vectors X and Y in a separable Banach space E with n o r m I[' II. THEOREM 3.6. I f X and Y are independent and lims~logP(llX][ < s) = - C x ,
e~O
limsTlogP(HY]]) < s) = - C y
e--~O
(3.15)
with 0 < 7 < cc and 0 < Cx, Cy < ec, then lim sup s ~ log P(HX + YII -<
s--+0
-< - max(Cx, cy),

1+~
l i m i n f s ' log P(llX +

~-~0
rll-< e) >- --(Clx/(l+y) -~ c l / ( l + Y ) )
The upper b o u n d follows from P([I X + YII < e) < min(P(llXl] _< s), P(llYll <_ s)) by Anderson's inequality and the independence assumption. The lower b o u n d follows from the triangle inequality IIX+ YII-< IIXII+ IIYII and Corollary 3.1. N o t e that both upper and lower constants given above are not sharp in the case X = Y in law. It seems a very challenging problem to find the precise constant for lim~_~0sTlogP([[X+ YII <-~), which we conjecture to exist, in terms of Cx, cy, I1"II, 7 and possibly the properties of covariance structure o f X and Y. W h a t happens for the sums i f X and Y are not necessarily independent but with different small ball rates? This is given recently in Li (1999a) as an application of T h e o r e m 2.14. THEOREM 3.7. F o r any joint Gaussian r a n d o m vectors X and Y such that (3.15) holds with 0 < 7 < ec, 0 < Cx < cc and Cy = 0, we have lim s ~ log P(llX + YII -< e) = - C x
s~0
The p r o o f is so easy now that we have to present it, keeping in mind the very simple argument for (2.12) or (2.13). F o r the lower bound, we have by the inequality (2.13) with any 0 < fi < 1, 0 < 2 < 1, P(IIX + Y[I -< e) _> P(IIXII _< (1 - 6)e, IIYII _< 6~) >_ p(lIx[/_< 2(1 - 6)s). p(IIYH < (1 - ~2)1/2~8) Thus l i m i n f s ~ log P(IIX + YI] -< s) > - ( 2 ( 1 - a))-~Cx
8--+0 .
550
and the lower bound follows by taking 6 --+ 0 and 2 --, 1. For the upper bound, we have again by the inequality (2.13) with any 0 < c~ < 1, 0 < 2 < 1,
P IlXll-<(1
->P IlX+rtl-<i'llYll-<'~(1 _> P(llX + YII _< ~) x p IIYII-< (1

-- X2)1/2~5 (1
,5)~
Thus l i m s u p C l o g P([IX + Y]I -< ~) -< -(2(1 - c~))TCx

e---*0
and the upper bound follows by taking 6 ---+0 and 2 ~ 1. As a direct consequence of Theorem 3.7, we see easily that under the sup-norm or Lp-norm, Brownian motion and Brownian bridge have exact the same small ball behavior at the log level, and so do Brownian sheets and various tied down Brownian sheets including Kiefer process; see Sections 5.2, 6.2 and 7.2.
3.4. Lower bound on supremum under entropy conditions

A Gaussian process X --- (Xt)te r with index set T is a random process such that each finite linear combination ~ i eiXt, E N, t i T, is a real valued Gaussian variable. We always assume it is separable. For a detailed discussion related to separability, we refer to Section 2.2 of Ledouse and Talagrand (1991). The distribution of the Gaussian process X is therefore completely determined by its covariance structure F~Xt, s,t C T. Assume (Xt)t~r is a centered Gaussian process with entropy number N(T, d; e), the minimal number of balls of radius a > 0, under the Dudley metric
d(s,t) = (EIXs-Xt[2) U2,
s, t E T
that are necessary to cover T. Then a commonly used general lower bound estimate on the supremum was established in Talagrand (1993) and the following nice formulation was given in Ledoux (1996, p. 257). THEOREM 3.8. Assume that there is a nonnegative function ~ on R+ such that N(T,d;e) < 0(e) and such that el~'(e)_< ~O(e/2)_< c2~(~) for some constants 1 < ci _< c; < oo. Then, for some K > 0 and every a > 0 we have
log P (sup
In particular, log P (sup,Xt, < tcr,
-<
->
~) _>- -~,(a) .
551
The proof of this theorem is based on the Khatri-Sidak correlation inequality given in (2.8) and standard chaining arguments usual for estimation of large ball probabilities via N(T, d; .); [see e.g. Ledoux (1996)]. The similar idea of the proof was also used in Shao (1993) and Kuelbs et al. (1995) for some special Gaussian processes. Here is an outline of the method; a similar argument is given at the end of Section 4.1. Let (Xt)tcr be a centered Gaussian process. Then, by the Khatri-Sidak inequality P (sup ,Xt,<x, ,Xto]<_ x) >_ P(,Xtol <_x)P ( s t ~ [Xt,_< x)
\ tcA
for every A C T, to E T and x > 0. If there are a countable set Tc and a Gaussian process Y on Tc such that
then we have
;up L.,l \ tET
t6Tc
II
Since Yt is a normal random variable for each t c To, the right hand side above can be easily estimated. So, the key step of estimating the lower bound of P(supm~ [Xt[ <_x) is to find the countable set T~ and Gaussian process Y. Although Theorem 3.8 is relatively easy to use, it does not always provide sharp lower estimates even when N(T, d; ~) can be estimated sharply. The simplest example is Xt = it for t E T = [0, 1], where ~ denotes a standard normal random variable. In this case, P (sup IX,I < e~ =
\ tcT
P(I1 <
e ) ~ (2/7c) 1/2"8
but Theorem 3.8 produces an exponential lower bound exp(-c/e) for the above probability. More interesting examples are the integrated fractional Brownian motion given in Section 4.4 and the fractional integrated Brownian motion W~ given in Section 6.3. We know as applications of Theorem 3.4 and a special upper bound estimate, lime2/~log P ( sup IW~(t)l <_ e~) = -k/~ , / ~0 k,0<t< 1 (3.16)
0 < k~ < oc,/3 > 0; [see Li and Linde (1998)]. But for fi > 2, Theorem 3.8 only implies a lower bound of order e-1 for the log of the probability. When fi = 3, 14~(t)= (t-s)dWs=
W(s)ds,
t >_O ,
552
is the integrated Brownian motion and the sharp lower estimate of order e-2/3 was first obtained in Khoshnevisan and Shi (1998a) by using special local time techniques. The following example in Lifshits (1999) suggests that the stationarity plays a big role in the upper estimate in Theorem 4.5, Theorem 4.6 and that L2-norm entropy N(T, d; .) is not an appropriate tool for the upper bound. EXAMPLE. Let c~> 0, and {~i} be i.i.d, standard normal random variables. Define ~b(t) = 1 - 12t - 11 for t E [0, 11. Let {u} denote the fractional part of real number u. Put
Xt = ~ot + ~ 2-~i/2~ic~((2it})
i=l
oo
for
t E [0, 11 .
It is easy to see that E(Xt - X~)2 _> clt - sl ~ for all s, t E [0, 1], where c > 0 is a constant. However, we have
lg(P(suplXt]<-e) \0_<t_<l
as e ~ 0. To see the lower bound, we have P
(3.17)
sup Ixtl _< e
\0<t< 1
_> p
2 =i/2l~,l <
> P(l~il -< e2=;/4( 1 - 2-=/4), i = o, 1,...) = M P(I~ll < e2=//4( 1 - 2 =/4))
i=0
_> exp(-K2 log2(1/e)) for some positive constant K2. The upper bound can be proven as follows: a x l X ( 2 k)]_< e) PC sup ,X,, _< ~) _2
< exp(-Kl log2(1/e)) for some positive constant K1.
553
For the upper bound estimates, there is no general probabilistic method available in the spirit of Theorem 3.8 at this time. Various special techniques based on Anderson's inequality, Slepian's lemma, exponential Chebychev inequality, iteration procedure, etc, are used in Pitt (1978), Shao (1993), Monrad and Rootzen (1995), Kuelbs and Li (1995), Stolz (1996), Dunker et al. (1998), Li (1999b), Li and Shao (1999a), Dunker et al. (1999) and references therein. See Sections 4.2 and 5.2 for more information.
3.5. Connections with l-approximation numbers

The small ball behaviour of a Gaussian process is also closely connected with the speed of approximation by "finite rank" processes. For the centered Ganssian random variable X on E, the nth/-approximation number of X is defined by
ln(X) = inf
1
~jxj
: X d=
j=l
~jxj, xj E E
(3.18)
where ~j are i.i.d, standard normal and the inf is taken over all possible series representations for X. One may consider l,(X) as a measure of a specific orthogonal approximation of X by random vectors of rank n. Note that l~(X) ~ 0 as n --+ oc if X has bounded sample path. Other equivalent definitions and some well known properties of l,(X) can be found in Li and Linde (1999) and Pisier (1989). The following results show the connections between the small ball probability of X and its/-approximation numbers l,, (X). THEOP,~M 3.9. Let ~ > 0 and fi E R. (a) If
ln(X) ~ n-l/=(1 + l o g n ) ~ ,
then - l o g P(IIXII ~ ~) -~ ~ ~(log l/e) ~ . (b) Conversely, if (3.20) holds, then
(3.19)
(3.20)
l,(X) ~_ n-I/~(1 + logn) p+l
(3.21)
Moreover, if E is K-convex (e.g. Lp, 1 < p < ee), i.e. E does not contain l~s uniformly [see Theorem 2.4 in (1989)], then (3.19) holds and thus (3.19) and (3.20) are equivalent in this case. (c) If -logP([]X]l < 2e) _~ -logP([IX][ <_ e) ~_ e-~(log 1/~) ~ ,
554 then
w. v. Li and Q.-M. Shao
In(X) ~ n 1/~(1 Tlogn)/3-1/~ .
(d) If E is K-convex and

l,(X) ~ n-1/~(1 + logn) ~ ,
then -logP(llX]l < e) ~ e ~'(log l/e) "/~ Parts (a), (b) and (c) of Theorem 3.9 are given by Li and Linde (1999) and part (d) is a very nice observation of Ingo Steinwart. There are several natural and important open questions. Does l ~ ( X ) ~ n -1/~ imply a lower estimate for -log P([[X[[ _< e)? What is the optimal power for the log-term in (3.21)? Recently it is shown in Dunker et al. (1999) that under the sup-norm
In (Ba#) ~ n -~/2 (1 + log n) d(~+1)/2-~/2
(3.22)
for the fractional Brownian sheets Bd#(t), d _> 1 and 0 < ~ < 2; see Section 5.2 for definition. Hence the best known lower bound (5.8) for Bd,~ under the supnorm over [0, 1]d follows from (3.22) and part (a) of Theorem 3.9. On the other hand, the correct rates of small ball probabilities for Brownian sheets Bd,1, d >_ 3, are still unknown under the sup-norm. See Section 5.3. This suggests that ln(X) may be easier to work with. In fact, finding the upper bound for In(X) is relatively easy since we only need to find one good series expansion for X. But it may not be sharp even for the standard Brownian motion W ( t ) = B l , l ( t ) , 0 < t < 1, under the sup-norm since ln(W) ~ n-1/2(1 + logn) 1/2 and log P(sup0_<t_<I [W(t)[ _< e) _(7r2/8)e 2. Consequently, we also see that (3.20) does not imply (3.21) with logpower fl in general. At least fi + 1/2 is needed. Finally, we mention that the links to approximation theory are not restricted only to metric entropy and the /-approximation numbers. The small ball probability for X is also related to many other different approximation quantities such as Gelfand numbers, Kolmogorov numbers and volume numbers of the compact linear operator from H~ to E associated with X, although the links are not as precise as those with the entropy numbers; [see Li and Linde (1999) and Pisier (1989)].
3.6. A connection between small ball probabilities
Let X and Y be any two centered Gaussian random vectors in a separable Banach space E with n o r m [1. [[. We use I'll(x) to denote the inner product norm induced on H~ by p = a(x). The following relation discovered recently in Chen and Li (1999) can be used to estimate small ball probabilities under any norm via a relatively easier L2-norm estimate.
Gaussianprocesses.Inequalities,smallballprobabilitiesandapplications
THEOREM 3.10. For any 2 > 0 and a > 0, P(I]YI] -< e) _> P([]X]I _< 28). Eexp{-2-122[y[2~(x)} . In particular, for any 2 > 0, e > 0 and 6 > 0, p(/lYl] _< 8). exp{-,~262/2} >_ P(llXll < ,zs)p(lYl~cx ) _< 6) .
555
(3.23)
Note that we need Y E Hu(x) c E almost surely. Otherwise for f ~_Hu(x), oc and the result is trivial. Thus the result can also be stated as follows. Let (H, 1.]H) be a Hilbert space and Y be a Gaussian vector in H. Then for any linear operator L : H --+ E and the Gaussian vector X in E with covariance operator LL*
Ifl~(x) =
p(IILYII _< a) > P(llXII <_ At)-Eexp{-2-1221YI 2} for any 2 > 0 and 8 > 0. The proof of Theorem 3.10 is very simple and based on both directions of the well known shift inequalities (3.2). Without loss of generality, assume X and Y are independent. Then P(llYII -< ~) -> P(IIX-'~YII -< 28) > P(IIxI/< ~.8) HZexp{-2 -1 , z 2 2 _ _ l r l . ( x ) }
To see the power of Theorem 3.10, we state and prove the following special case of a result for X~(t) given in Chen and Li (1999), where Xm(t) is defined in (3.12). The particular case of m = 1, or so called integrated Brownian motion, was studied in Khoshnevisan and Shi (1998a) using local time techniques. THEOREM 3.11. We have lim g2/3 log P
e---+O
sup \O<t_<l
ftW(s) dx
_< 8) = -re
(3.24)
with 3/8 < ~c < (2zt)2/3. 3/8 . (3.25)
The existence of the limit is by subadditivity; see Section 6.3. The lower bound for ~ in (3.25) follows from P(sup0<t< 1 If0 W(s)dsl <-e) < P(f01 If0 W(s)dsl2dt _< a2) and the L2 estimate given in (3.14). The upper bound for ~ in (3.25) follows from Theorem 3.10, the L2 estimate given in (3.14) and the well known estimate logP(supo<t<lW(t)l<_E)~_(rc2/8)e 2. To be more precise, take Y(t)= J~ W(s)ds an-d X = W(t); then
P(S<_UtPl~otW(s)ds ~ 8)
> PQ0s~PlIW(t)l~)~8)- Eexp{-2 1~2~01W2(s)ds}.
556
Taking ). = 28 = (7C2/2)1/3~3 -2/3, the bound follows. It is of interest to note that both bounds rely on easier L2 estimates and the constant bounds for tc are the sharpest known.
4. Gaussian processes with index set T C IR
This section focuses on small ball probabilities of Gaussian processes with index set T C R. For the sake of easy presentation, we assume T = [0, l]. Some of the results are covered by general approaches in the last section. But we still state them in different forms for comparison and historical purposes.
4.1. Lower bounds

We first present a general result on the lower bound. THEOREM 4.1. Let {Xt, t E [0, 1]} be a centered Gaussian process with X(0) = 0. Assume that there is a function a2(h) such that
VO<_s,t<_
1,
(4.1)
and that there are 0 < cl _< c2 < 1 such that c~a(2h A 1) < o-(h) _< c2~(2h A 1) for 0 < h < 1. Then, there is a positive and finite constant Ka depending only on cl and c2 such that P(\0_<t<_suplIX(t)] -< a(e))
> exp(-K1/e) .
(4.2)
The above result was given in Cs6rg6 and Shao (1994), Kuelbs et al. (1995). It can also be derived from the Talagrand lower bound in Section 3.4. Its detailed proof is similar to the outline we give after the Theorem 4.3. The next result is intuitively appealing. It says that the small ball probability P(sup0<t<_l IXtl <_cr(~)) is determined by P(maxi_<i_<l/~ Ix(icSe)l _< a(e)) as long as 6 is sufficiently small. We refer to Shao (1999) for a proof. THEOREM 4.2. Under the condition of Theorem 4.1, there are positive constants K1 and 0 depending only on Cl,C2 such that V 0 < 8 < 1,0 < e < 1 P(\0_<t_<lsupIX(t)] < (1 +
6)cr(g))
max
>_exp(-K16/e)P(
\0<i_< 1/(~e)
IX(i~e)l _< a(~)~ /
(4.3)
Our next result is a generalization of Theorem 4.1, which may be useful particularly for a differentiable Gaussian process. An application is given in Section 4.4 for integrated fractional Brownian motion.
557
THEOREM 4.3. Let {Xt, t C [0, 1]} be a centered Gaussian process with X(0) = 0. Assume that there is a function 0-2(h) such that V0_<h<l/2, h<t<_l-h,
Assume that there are 0 < cl <_ c2 < for 0 < h < 1. Then, there is a positive and finite constant K1 depending only on cl and c2 such that P ( sup ]X(t)[ _< 0-(e)~ _> exp(-K1/e) . / k,0_<t_<1 (4.5)
F_(X(t+h)+X(t-h)-2xt)Z<_a2(h). (4.4) 1 such that Cl0-(2h A 1) _< a(h) _< c20-(2hA 1)
The idea of the proof has been explained in Section 3.4. To illustrate the idea more precisely, it is worthwhile to outline the proof as follows. The assumptions already imply that X is almost surely continuous. Therefore, sup [X(t)l < [X(1)I
0<t<l
OO
+ ~
k=l Without
1<i<2 k --
max IX((i+ 1)2 k) +X((i- 1)2 -k) - 2X(iZ-k)l
a.s.
loss of generality, assume 0 < e < 1. Let no be an integer such that

2 no < e < 2 - n 0 + l
and define e~=0-(81.5 In-kl)/K, k = 1,2,...,
where K is a constant. It is easy to see that

OQ
k=I
provided that K is sufficiently large. Hence by the Khatri-Sidak inequality
(:up
>
P(Ix(1)I _<0-(2)/2, 1max Ix((i4- 1)2 <i<2 k

k +X((i-
k)
%
1)2 -k) -2X(i2-k)l < e~, k = 1 , 2 , . . . )
_> p(lx(1)l < 0-(E)/2)
I~I [ I
k=l 1<i<2 k
P( IX((i+ 1 ) 2 - k ) + X ( ( i - 1 ) 2
k)_2X(i2-k)l _<ek) .
A direct argument then gives (4.5).
558
4.2. Upper bounds
The upper bound of small ball probabilities is much more challenging than the lower bound. This can be seen easily from the precise links with the metric entropy given in Section 3.2. The upper bound of small ball probabilities gives the lower estimate of the metric entropy and vice versa. The lower estimates for metric entropy are frequently obtained by a volume comparison, i.e. for suitable finite dimensional projections, the total volume of the covering balls is less than the volume of the set being covered. As a result, when the volumes of finite dimensional projections of K~ do not compare well with the volumes of the same finite dimensional projection of the unit ball of E, sharp lower estimates for metric entropy (upper bounds for small ball probabilities) are much harder to obtain. Some examples are given in Section 7.6. We start with the following general result. Although it is not as general as Theorem 4.1, it does cover many special cases known so far. Note further that the example given in Section 3.4 shows that L2-norm entropy is not an appropriate tool for the upper bound. THEOREM 4.4. Let {Xt, t E [0,1]} be a centered Oaussian process. Then V0<a_< 1/2,8>0 P(_sup IXtl<e _ k,0<t<l provided that a Z
2<_i<_l/a
<exp _
22<_ij~/a(~(~i~j)) 2
(4.6)
IF~2-> 3282 ' 1)a) or ~i = X ( i a ) + X ( ( i 2)a) - 2 X ( ( i 1)a).
(4.7)
where ~i = X(ia) - X ( ( i -
As a consequence of the above result, we have THEOREM 4.5. Let {Xt, t E [0, 1]} be a centered Gaussian process with stationary increments and X0 = 0. Put
2(It-sl)
lX,-x,I 2,
s,t
[0,1]
Assume that there are 1 < cl _< c2 < 2 such that ClG(h)_<a(2h)<c2a(h) for0<h<l/2 . (4.8)
Then there exists a positive and finite constant K2 such that V 0 < 8 < 1, D ( sup
k.0<t< 1
IX~l_< G(8)~ < /
exp(-K2/e)
(4.9)
if one of the following conditions is satisfied.
559
(i) 02 is concave on (0, 1); (ii) There is co > 0 such that
(002(a))" <_coa 3002(a) for 0 < a < 1/2.
When (i) is satisfied, the result is due to Shao (1993). The original proof is lengthy; a short proof based on Slepian's inequality was given in Kuelbs et al. (1995). Here we use Theorem 4.4. Let a = eA, where A > 2 will be specified later. Without loss of generality, assume 0 < a < 1/4. Define ~i = X(ia) - X ( ( i - 1)a). It is easy to see that a Z
2<i<_l/a
I:~ > (1 - 2a)002(a) > 32002(e) .
Noting that E(~#g~) < 0 for i < j, we have Z

2<i,j<_l/a
(E(~/~J)>2 <- 002(a) Z

2<_i,j<l/a
I~-(~i~J >1<- 004(a)/a
(4.10)
N o w (4.9) follows from Theorem 4.4. When (ii) is satisfied, let a = eA, where A _> 2, and let ~/i =X(ia)+ x ( ( i - 2 ) a ) - 2 x ( ( i - 1 ) a ) . Noting that ~(t/3t]#)
~-
4a2((i - 2)a) + 4002((i - 3)a) - 6002((i- 3)a) - 002((i _ 1)a) - a 2 ( ( i - 5)a)
for i >_ 6, we have by the Taylor expansion IE(t/gt/i)l 2

6<i<_l/a
<_f <-- Z
~
6<i<l/a
(a3002(ia)/(ia)3)2 (i2002(a)/(ia)3)2
6<i<l/a < K004(a) .
Similarly, one can see that (4.7) is satisfied as long as A is large enough. Hence (4.9) holds, by Theorem 4.4. We now turn to prove Theorem 4.4 which indeed is a consequence of the following lemma. LEMMA 4.1. For any centered Gaussian sequence {~i} and for any 0 < x < ~i<n IZ~, we have P \i<3 (~--~ ~/2 -< x ) < exp ( _ 4 ~
(~i<n12~2-x)2~ i~/~j)S)
(4.11)
The proof of L e m m a 4.1 given here is of independent interest. It is easy to see that there exists a sequence of independent mean zero normal random variables ~/# such that
560
(4.12)
i=1 i~l
Let
2 ~_<~ (lEt//2)z
n Then for any 0 < x < ~i-1 ~7
i<n
<-exp(-C \i<~ Zf-{~-x)

=exp( Note further that
i<~
,,
(Ei<J' [F-{2- x ) 2 " ~ ~ < ij_ ~ 7
for ll2~i~ e2 ~j~2
(~-~)(~5~) + 2(~i~j) 2. The lemma follows from the above inequali-
ties.
4.3. Fractional Brownian motions

A centered Gaussian process X = {Xt, t [0, 1]} is called a fractional Brownian motion of order c~ C (0, 2), denoted by X E fBm~, if X0 = 0 and
V O S s , t< 1, ~ l x t - x , 1 2 = It-s] ~ .
(4.13)
When ~ = 1, it is the ordinary Brownian motion. The name of the fractional Brownian motion was first introduced in Mandelbrot and Van Ness (1968), but their sample path properties were already studied by Kolmogorov in the 1940s. The study of the fractional Brownian motion was motivated by natural time series in economics, fluctuations in solids, hydrology and more recently by new problems in mathematical finance and telecommunication networks.
561
It is easy to see that the assumption in Theorem 4.1 and condition (ii) in Theorem 4.5 are satisfied for fractional Brownian motions. Hence we have the following sharp bound of small ball probabilities due to Monrad and Rootzen (1995) and Shao (1993). T~EOREN 4.6. Let X E fBm~, c~ E (0, 2). Then there exist 0 < K1 < K2 < oo depending only on c~ such that V 0 < e < 1
-K2e 2/~ _< logP(\0<t_qsup IXtl < 5) < -KI 8-2/ce
(4.14)
Other small ball probabilities for fractional Brownian motions under nonuniform norms, such as H61der norm and Sobolev norm, are discussed in Baldi and Roynette (1992), Kuelbs and Li (1993b). The first result below is given in Kuelbs et al. (1995) and the second in Li and Shao (1999a) which include the Lp-norm, p >_ 1. One may refer to Stolz (1993) for a universal approach to different norms based on Schauder decomposition and a detailed discussion on the approach is given in Ledoux (1996). The third result below is given in Stolz (1996). Related results for increments can be found in Zhang (1996b). In Section 6.3, the existence of the small ball constants for results below is indicated. THEOREM 4.7. Let X E fBm~., ~ E (0, 2) and let 0 _< fi < e/2. Then there exist 0 < K1 _</(2 < cc depending only on and fi such that g 0 < e _< 1 _K2e-2/(~ 2/~) < l o g P ( sup IX-t--~ l < e~ < -K1 g-2/(~ 2/3)
\s,<0,11 Itsl ] -
THEOREM 4.8. L e t X C fBm~, ~ ~ (0,2), and l e t p > 0, 0 _< q < 1 + p , / 2 , q 1. Then there are 0 < K1 <~ K2 < oo depending only on ~, p and q such that V0<~<I
-K2 - < l o g P
(/070 '
ItZs~
dtds <_ e
) <_ - K l z -
where 0 = 1/(~/2 - m a x ( 0 , q - 1)). THEOREM 4.9. L e t X E fBm~, e E (0,2). For 0 < l i p < fi < 1/2 and 1 < q < oc, log P(lIXtll/3,p,q < g) "~ - 8 -2/(e-2/3) where the Besov norm
1%(t,f) ( ~ ) qdt 1/q T)
"f]l~'P'q= [ ] f l ' P + ( ~
562
with
l/p
O)p(t,f) = s u p ( /
Ihl<t \JSh
If(x-h)-f(x)l
p dx)
and Ih = {x E [0, 1] : x - h ~ [0, 1]}.
4.4. Integrated fractional Brownian motions
Consider the integrated fractional Brownian motion
r(t)=
fOX.du,
where X is the fractional Brownian motion of order c~ ~ (0, 2). The following result is a very special case of what is given in Li and Linde (1999); see the remarks at the end of Section 3.2. In fact, the so called small ball constant exists; see Section 6.3 for details. But the proofs given in Li and Linde (1999) are not probabilistic, in particular the lower bound. Here we give a direct pure probabilistic proof based on Theorems 4.3 and 4.4. THEOREM 4.10. There exist 0 < K1 _< K2 < oc depending only on e such that V0<~<I
-K2 g-2/(2+~) ~
log~( sup I)z(t)[ ~ g) ~ -K1~-2/(2+ a)

\t~[o,1}
(4.15)
1 -
We only provide an outline of the proof. For 0 < h < 1/4 and h < s < t < h, we can write
Y(t + h) + Y(t - h) - 2Y(t) =
~0h( X ( t
+ u) - X ( t - u))du
and
Y(s + h) + Y(s - h) - 2Y(s) =
~0h( X ( s
+ v) - X ( s - v))dv .
Note that
IE((Y(t + h) + Y(t - h) - 2 Y ( t ) ) ( Y ( s + h) + Y(s - h) - 2Y(s)))l
= F f h (x(t +.) - x ( t -
do Jo

',
563
=]
(Lt-s+u-vf +lt-s-u+vl
It ~-~-<~-Lt-~+u+~l~)d~d~
(It-- s] + h)~-2h 4 . Hence, applying Theorems 4.3 and 4.4 yields the result.
5. Gaussian processes with index set T c IRd, d ___ 2
There are two versions of extension of the fractional Brownian motion in d-dimensional space. One is the so called L6vy fractional Brownian motion of order ~ E (0, 2) defined by
X 0 ~- O, ~ = O, ~-(Xt - X s ) 2 = It - sl =
for
s, t c [0, 1]d
The other is the so called fractional Brownian sheet of order e E (0, 2) if the covariance satisfies
Vs, t c
[0,1] d, ~(.v,xs) = .j~2Qsff +,~ -Is;- tjI ~) .
d 1
The classical Brownian sheet corresponds to c~= 1.
5.1. Ldvy's fractional Brownian motions

Following Shao and Wang (1995), and Talagrand (1993, 1995) we have sharp bounds of small ball probability for L6vy fractional Brownian motions. THEOR~ 5.1. Let {Xt, t E [0, 1]d} be a L6vy fractional Brownian motion of order E (0, 2). Then there exist 0 < K1 <_ K2 depending only on ~ and d such that V0<e<_l
-Kae -2d/~ <_ l o g P (
\ te[0,1ff
sup IXt[ < ~'] <_ - K l e -2d/~ . /
(5.1)
Here is a proof for the upper bound. For i = (il,...,ia), write ti = ie2/~. Clearly, P( sup IX(t)l < e ) _ < P ( max
lX(ti)l < g )
\ te[0,1J
\ 1 <i<e-2/~
and for 1 < j _< e-2/~
564
14/. V. Li and Q.-M. Shao

P( max [X(ti)[ ~ ~) max
\1 <i<g-2/~
= E(I(
\
~ l<_i~e-2/~,iTkj
[X(ti)l < a}
l
P(IX(tj)I < gJ X(ti), 1 Var(X(@ IX(s): Is - ti[ > e2/c~) = Ce2
Hence,
P(IX(tj)I <_ a I X(ti), 1 < i < a-2/7, i 7k j) < P(]{] < 1/x/C) < 1 ,
where ~ is the standard normal random variable, and
max
\1 <i<_e -2/~
\1 <i<a-2/=,i#j
We thus have the upper bound by recurrence. The lower bound is a consequence of the following general result which can be derived from Theorem 3.8 [see also Shao and Wang (1995)]. THEOREM 5.2. Let {Xt, t ~ [0, 1]d} be a centered Gaussian process with )2o = 0. Assume that
Vs, tc[O, 1]d,
EIXt-X~[ 2_<~r2Clt-sl)
(5.2)
and that there are 0 < cl _< c2 < 1 such that cla(2h A 1) _< a(h) <_ c2a(2h A 1) for 0 < h < l. Then, there is a positive and finite constant/1 depending only on cl and c2 such that for all e > 0 l o g P ( sup Ix(t)l _< o-(e)~ > -K1/~ d / \ tC[0,1] a
(5.3)
An upper bound for a general class of stationary Gaussian processes was given by Tsyrelson [see Lifshits and Tsyrelson (1986)] in terms of spectral density. In particular, i f X is a homogeneous process on ~d with spectral density f satisfying f(u) ~_ lul -d-~ as u --+ oo, then
( up
\ [0,1] d
_<
Gaussianprocesses: Inequalities, small ball probabilities and applications 5.2. Brownian sheets
565
We first state a precise result of Cs/tki (1982) on the small ball probability of Brownian sheet under L2 norm. THEOREM 5.3. Let {Xt, t E [0, 1]d} be the Brownian sheet. Then log P [0,11 ~ where Cd = 2d 2/(V~rcd-I ( d - 1)!). Various non-Brownian multiparameter generalizations of the above result are given in Li (1992a). Next, we consider the case d = 2. THEOREM 5.4. Let {X~, t C [0, 1]2} be a Brownian sheet. Then there exist 0 < K1 _< K2 < oo such that g 0 < e <_ 1 -K2e 210g3(1/e)< l o g P ( sup ]Xt[ < e~ < - - K l e 21og3(1/~) . / \ tC[0,1] 2 IXtl2dt
<_ e ~ --cde -211og(e)y-2 ,
(5.4)
(5.5)
The lower bound is due to Lifshits [see Lifshits and Tsyrelson (1986)] and Bass (1988), and the upper bound to Talagrand (1994). A simplified upper bound proof can be found in Dunker (1998). The following small ball probabilities under the mixed sup-L2 norm and L2-sup norm may be of some interest. THEOREM 5.5. Let {Xt, t E [0, 112} be a Brownian sheet. Then there exist 0 < K1 _</2 < oo such that V 0 < e _< 1
- Kae-l lg2 (1/g) <- lg P (

<_ -Kae l log2(1/e) and - K2e-1 log3(1/e) < log P < - K l e -1 log3(1/e) .
@[0,1] fo
l lX(tl ' t2)12dt2 <- e)

(5.6)
(/01
sup IX(t1, t2)12dt2 <_ e t~c[0,1]
)
(5.7)
The upper bound of (5.6) follows from (5.4), and the lower bound is given in Horvfith and Shao (1999). The lower bound of (5.7) is from (5.5) and the upper bound can be shown with modification of the arguments used in the proof of the upper bound of (5.5) given in Talagrand (1994). For d > 3, the situation becomes much more difficult as the combinatorial arguments used for d = 2 fail and there is still a gap between the upper and lower bounds.
566
THEOREM 5.6. Let d _> 3, and {Xt, t E [0, 1]d} be the Brownian sheet. Then there exist 0 < K1,K2 < OO such that V 0 < e _< 1
~K2~2 1ON 2d 1(1/~) < l o g P ( sup Ix, I _<

\tc[o,1]d
_< -Kig-Rlog<2d-2)(l/g) (5.8)
The upper bound above follows from (5.4), and the lower bound was recently proved by Dunker et al. (1998) [a slightly weaker lower bound is given in Belinsky (1998)]. It should be pointed out that the proofs of the lower bound in Dunker et al. (1998) and Belinsky (1998) are based on approximation theory and part (IV) of Theorem 3.3, and hence are not probabilistic. Another way to obtain the lower bound in (5.4) is to use the estimates on the /-approximation numbers ln(X) (a probabilistic concept) given in (3.22) and part (a) of Theorem 3.9. But both proofs of (3.22) on ln(X) and part (a) of Theorem 3.9 use various approximation concepts and hence are also not probabilistic. It would be interesting to find a pure probabilistic proof of the lower bound in (5.4). The only known probabilistic proof for the lower bound with d > 2 is presented in Bass (1988) which gives 3d - 3 for the power of the log-term. Similar to Theorem 5.6, Dunker (1999) obtained the following upper and lower bounds for the fractional Brownian sheet using methods detailed above. THEOREM 5.7. Let {Xt, t E [0, 1]d} be the fractional Brownian sheet of order E (0, 2). Then there exist 0 < K1,/2 < oc such that V 0 < e < 1
-K2e 2/~log(l+~)d/~-l(1/e)<_ l o g P ( sup [XtJ <_ e~

\ tE[0,1]d /
<_ -K1E-2/~ log(l+~z)d/e-2(1/g) .
(5.9)
6. The small ball constants
So far we have been mainly interested in the asymptotic order (up to a constant factor) of the small ball rate function ~b(e) given in (3.5). In this section, we will present results in which the exact constants are known or known to exist, and we call them small ball constants. Keep in mind that results of this type (even just the existence) play a more important role in applications of small ball estimates as can be seen in Section 7. In the Hilbert space 12, the full asymptotic formula is known. And with the help of a comparison result, most small ball probabilities under the L2-norm can thus be treated at least in principle, and in particular when the Karhunen-Loeve expansion for a given Gaussian process can be found in some reasonable form. This is the case for Brownian motion and Brownian sheets, etc. Other exact values of small ball constants are known only with a pure analytic
567
representation. It is no surprise that most of them are related to Brownian motion in one way or another. The most elusive small ball constants are those shown to exist and they may even connect with each other. It is challenging to show the existence and to find those unknown but existing ones at least in terms of a pure analytic representation. We only present some basic tools and results here. Throughout this section, we use
f (flo [f(t)lPdt) '/p
SUpo<t<l [f(t) l
for 1 < p < for p = oc
to denote the Lp-norm on C[0, 11, 1 _< p _< oc.
6.1. Exact estimates in Hilbert space

Consider a continuous Gaussian process {X(t) : a < t < b} with mean zero and covariance function a(s, t) = Lg(s)X(t) for s, t c [a, b]. We are interested in the exact asymptotic behaviour of
P(~abX2(t)dt <_ a2)

as a --+ 0. By the well known K a r h u n e n - L o e v e expansion, we have in distribution
f bXz(t)dt = Z 2 " 4 ]
n>_l
where 2, > 0 for n _> 1, ~'-,>1 2, < ec, are the eigenvalues of the equation
2f(t) =
Ja
a(s,t)f(s)ds,
a<t<b .
Thus the problem reduces to finding the asymptotic behavior of
()
\n=l
~2
/].n~2 ~ g2
as e---+ 0, where {~,} are i.i.d. N(0, 1) random variables. Theoretically, the problem has been solved by Sytaya (1974). Namely THEOREM 6.1. If 2n > 0 and ~ n ~ l 2, < +c~, then as e ~ 0
p An~2
"~2~-1/2
z,,{ n < a 2
4n _ exp
]-+22.7x] ] log(1 + 22,y~,) ,
568
where 7; = 7;(e) is uniquely determined, for e > 0 small enough, by the equation
g2 = ~ ' ~
2n
Note that the given asymptotic behaviour is still an implicit expression that is highly inconvenient for concrete computations and applications. This is primarily due to the series form for the asymptotic and the implicit relation between e and 7~ in Theorem 6.1. A number of papers, Dudley et al. (1979), Ibragimov (1982), Zolotarev (1986), Dembo et al. (1995), Dunker et al. (1998), have been devoted to finding the asymptotic behaviour of P ( ~ n ~ l 2n~] -< e2) as e --+ 0, or sharp estimates at the log level for some particular 2n, after the work of Sytaya because of the difficulties in applying Theorem 6.1. Most of the results of these papers involve difficult calculations that most often depend very much on special properties of the sequence 2n. Nevertheless, the problem is considered solved completely when eigenvalues 2n can be found explicitly. When eigenvalues 2n can not be found explicitly, the following comparison principle given by Li (1992a) provides a very useful computational tool.
THEOREM 6.2. If Y~-n~l I1 -
an/bn] < o% then as e --+ 0 bn/an

n=l
P
\n=l
an{ 2 <_g2
P
\n=l
n{n <- g2
where an, b~ are positive and ~n~l an < 0% ~n~_l bn < co. Furthermore, if an > bn oo 2 p oo 2 for n large, then (~n=l a,{n _< e2) and (~n=l bn~n ~ g2) have the same order of magnitude as e --+ 0 if and only if ~ = ~ l1 -an/bnl < oc. The following simple example demonstrates a way of using the comparison theorem, and more examples can be found in Li (1992a, b). Let {B(t) : 0 < t < 1} be the Brownian bridge and consider weighted L2-norms for B(t). PROPOSITION 6.1. For c~ > 0 and fi = 1 - ~
1 < 1 ,
P(flB2(t~)dt< c~e 2(~+~) exp
e2)
2(C~ @ 1) 2 "
as e ~ 0 ,
where co is a positive constant. To see this, note that by using the Karhunen-Loeve expansion, the eigenvalues are solutions of
4/~+, (2(~ + 1)-' Vg7X) -- 0

where Jr(x) is the Bessel function. Hence by the asymptotic formula for zeros of the Bessel function, we have
Gaussian processes: Inequalities, small ball probabilit&s and applications
569
~-1
~+1 which shows that ~-, ( ( ~ q 4~ _ 1)2~z 2 n + 4 ( -~ --] 1) ) -2 " Z 11 < oo .
Thus by Theorem 6.2 and Theorem 6.1, we obtain as ~ ~ 0
BZ(t~)dt <
- -
82
= ~
n~ < 82
tl - -
,(~
A.~ ~ + 1) 2~r2'
4C~ ( 2 n -C(--1 { ) ~2nKg2)

4~7+- ] ) " 2 ( ~ + 1) 2 ~
~c~e 2(o+~/exp
Next we mention that [see Li (1992a)] for any positive integer N,
which shows that the small ball rate function will not change at the logarithmic level if we delete a finite number of the terms. Finally, we mention that for any (even moving) shifts and any (going to zero or infinity) radius in this 12 setting, the exact asymptotic behaviours similar to Theorem 6.1 are studied in Li and Linde (1993) and Kuelbs et al. (1994).
6.2. Exact value of small ball constants

Let {W(t);0 < t < 1} be the standard Brownian motion and {B(t);0 < t < 1} be a standard Brownian bridge, which can be realized as {W(t) - tW(1); 0 < t < 1}. First we present the exact value of small ball constants for W(t) and B(t) under the Lp-norm, 1 < p < ~ . Its generalization and extension to other related processes are given in Theorem 6.4. THEOREM 6.3. For any 1 G p G oc lima 2 log P(]I W(t)ljp<- a)
e---*0
lima 2 log P(HB(t)IIp< g) = -Kp ,
a-~0
(6.1)
where
top = 22/pp(21 (p) / (2 + p))(Z+p)/p
(6.2)
570 and 21(/)) = inf

OO
IxlP$2(x)dx+
--OO
(~b'(x)) 2 dx
> 0 ,
(6.3)
the infimum takes over all 4 E L2(-oc, oc) such that f_o~ q52(x)dx = 1. The casesp and the exact (1937), Chung distribution is Namely, for 2 = 2 a n d p = oc with ~c2 = 1/8 and ~coo= rc2/8 are also well known, distributions in terms of infinite series are known; see Smirnov (1948) and Doob (1949). The only other case, for which the exact given in terms of Laplace transform, is in Kac (1946) for p = 1. > 0
[Fexp - 2
{/0'
IW(s)l ds
} Ojexp{-3j22/3}
= j=I
(6.4)
where bl, 32, are the positive roots of the derivative of P(y) = 3 -1 (2y)l/2 (J_1/3(3 -l (2y) 3/2) + J1/3(3 -1 (2y)3/2)) ,
J~(x) are
the Bessel functions of parameter e, and Oj = (3#) -1 (1 + 3 f0~j P(y)dy). The extension of (6.4) to values of 2 < 0 remains open as far as we know. By using the exponential Tauberian theorem given as Theorem 3.5, we have from (6.4), tq = (4/27)c5~ where 31 is the smallest positive root of the derivative of P(y). Now from asymptotic point of view for the Laplace transform, it was shown in Kac (1951) using the F e y n m a ~ K a c formula and the eigenfunction expansion that
lim~logEexp{- fotlW(s),Pds}=-21(p)
and 21 (p) is the smallest eigenvalue of the operator
(6.5)
the classical variation expression for eigenvalues, we obtain (6.3). A different and extremely powerful approach was given in Donsker and Varadhan (1975a) so that the direct relation between (6.5) and (6.3)
Af = -lf"(x) + IxlPf(x) on L2(-oc, oc). Thus from (6.6) and
(6.6)
lim{log~-exp{-ft,W(s)]Pds}
= -inf oo Ixlp~2(x)dx+2j_(@'(x))2
dx
(6.7)
holds as a very special case of their general theory on occupation measures for Markov processes. Both approaches work for more general functions V(x) than the ones we used here with V(x) --- ]xl p, 1 _<p < oc, and thus the statement for W in Theorem 6.3 also holds for 0 < p < 1. On the other hand, from the small ball probability or small deviation point of view, Borovkov and Mogulskii (1991) obtained
Gaussianprocesses: Inequalities, small ball probabilities and applications

P(IIWIIp _< g) ~ cl(p)eexp{-J~l(p)g -2}
571
by using a similar method to that of Kac (1951), but more detailed analysis for the polynomial term. Unfortunately, they did not realize that the variation expression (6.3) for 21 (p) and the polynomial factor e is missing in their original statement due to an algebraic error. Theorem 6.3 is first formulated explicitly this way as a lemma in Li (1999c). And for the Brownian motion part, it follows from (6.5) or (6.7), which is by Brownian scaling lim 2 2/(2+P) logEexp -)~
{I'
IW(s)] p as = -21(/))
and the exponential Tauberian theorem given as Theorem 3.5 with ~ = 2/1). The result for the Brownian Bridge follows from Theorem 3.7; [see Li (199%)] for a traditional argument. Next we mention the following far reaching generalization of the basic Theorem 6.3. THEOREM 6.4. Let p : [0, co) --+ [0, ec] be a Lebesgue measurable function satisfying the following conditions: (i) p(t) is bounded or non-increasing on [0, a] for some a > 0; (ii) p(t). t (2+p)/p is bounded or non-decreasing on [T, ~ ) for some T with a<T<oc; (iii) p(t) is bounded on [a, T] and p(t) 2p/(2+p) is Riemann integrable on [0, oc). Then for 1 < p _ < o c lirr~ ez log P
(//o
]p(t) W(t)I p dt
=-~p(foP(t) 2p/(2+p) at) (2+p)/p

where ~Cpis given in (6.2).
(6.8)
Before we give some interesting examples, some brief history and remarks are needed. In the case of the sup-norm (p = oc) over a finite interval [0, T], similar results were given in Mogulskii (1974) under the condition p(t) is bounded, in Berthet and Shi (1998) under the condition that p(t) is nonincreasing, and in Li (1999b) under the critical case that f~ p2(t)dt = oc. In the case of the sup-norm (p = oo) over an infinite interval [0, oc), the results were treated in Li (1999a) as an application of Theorem 2.14. The proof of Theorem 6.4 is given in Li (1999c) together with connections to Gaussian Markov processes. For related results and associations to Volterra operators, [see Lifshits and Linde (1999)]. Below we present some interesting examples of Theorem 6.4.
572 EXAMPLE 1. Consider Xl(t) =
on the interval [0, 1] for c~ < (2 p _> 1. Then Theorem 6.4 together with a simple calculation implies limealog P
t-~W(t)
+p)/2p,
(fo 1 ]t-~W(t)[ p dt < g ) = -tCp ( +-p-Z-2c~pJ
(6.9)
In Section 7.10 we will see the implication of this result in terms of the asymptotic Laplace transform. EXAMPLE 2. Let U(t) be the stationary Gaussian Markov process or the Ornstein-Uhlenbeck process with ~_U(s)U(t)=aZe -)-sl for 0 > 0 and any s , t ~ [ a , b], - o c < a < b < o c . Then we have for l _ < p < o c lim 82
g-+0
logP(llV(t)lp\
<_ ~2 = - 2 a 2 0 ( b
- a)(2+p)/Ptcp .
In the casep = 2, the above result and its refinement are given in Li (1992a) by using the Karhunen-Loeve expansion and the comparison Theorem 6.2. Other interesting examples that are a consequence of Theorem 6.4 can be found in Li (1999c). Next we mention the corresponding results in higher dimensions under the sup norm. Let {Wa(t);t >_ 0} be a standard d-dimensional Brownian motion and {Ba(t); 0 < t < 1} be a standard d-dimensional Brownian bridge, d _> 1. We use the convention 1/oc = 0 and denote by "11 tl(d>"the usual Euclidean norms in Rd. THEOREM 6.5. Let g : (0, ec) H (0, oo] satisfy the conditions: (i) inf0<t<oo g(t) > 0 oi" g(t) is nondecreasing in a neighborhood of 0. (ii) inf0<t<~ t-lg(t) > 0 or t-lg(t) is nonincreasing for t sufficiently large; Then lime a l o g p sup ~-~0 \0<t<oo
tlWd(t)ll(d) < e
g(t) -
)2/o
-
J(d-2)/2 2
g_2(t)d t ' J(d-2)/2

and
where J(d-2)/2 iS the smallest positive root of the Bessel function J-I~2 = ~/2. THEOREM 6.6. Assume that inf0_<t_<l9(t) > 0 or both (1 - t) neighborhood of 0, then lim~210gP
19(t ) and
infa<t<bg(t) > 0 for all 0 < a _< b < 1. If (1 - t) 19(1 - t) are nondecreasing in a .2 J(d-2)/2 f
2 J0
(i,Bd(t)ll(d)) sup <e

\0<t<l g(t) -
--
9 -2(t)dt
'
where J(d-2)/2 is defined in Theorem 6.5. Both results above, which extend earlier work of Mogulskii (1974) and Berthet and Shi (1998) for the sup-norm over a finite interval, are given in Li (1999a) as
573
applications of Theorem 2.14. For some applications of Theorem 6.6 to weighted empirical processes, we refer to Csfiki (1994).
6.3. Existence of small ball constants

As we have seen in the previous section, there are relatively few cases where the exact small ball constants can be given explicitly or represented analytically. A natural question, the next best thing we can hope for, is to show the existence of the small ball constants. As we all know, proving the existence or finding the exact value of various constants plays an important part in the history of mathematics. The most fruitful benefits are the methods developed along the way to obtain the existence or the exact value of an interesting constant, the heart of the matter in many problems. Another benefit, as we can see from (6.10) below, is that related constants can be represented in terms of a few basic unknown, but proven to exist, constants. For the small ball constants, they play important roles in problems such as the integral test for lower limits [see Talagrand (1996)], and various functional LIL results; see de Acosta (1983), Kuelbs et al. (1994), Kuelbs and Li (2000). We start with the work of de Acosta (1983) on the existence of the small ball constants for finite dimensional vector-valued Brownian motion under the suptype norm. To be more precise, let E be a finite dimensional Banach space with norm Q(.) and let # be a centered Gaussian measure on E. Let {WE(t) : t > 0} be an E-valued #-Brownian motion; that is, {WE(t) : t > 0} is an E-valued stochastic process with stationary independent increments, WE(0) = 0, WE has continuous paths and 5(We(1)) = #. THEOREM 6.7. The limit lime2 s u p lg~P 0 ( \0<t<l Q(WE) <- e) =-C~,Q exists and 0 < Cu,Q< oo. Note that in the case # is the canonical Gaussian measure on E = R d, d _> 1, C~,Q=j~d 2)/2/2 in Theorem 6.5 when Q ( . ) = II'll(d> is the usual Euclidean norm, and C~,Q= d.j21/2/2 = d(lr2/8) when Q(x) = maxl_<ijlxi/ for x = (xl,... ,Xd) E R d. The method of proof is formulated as a scaling argument (and same as the well known subadditive argument in this problem) with upper bound on the probability. Next we mention the result for the standard Brownian motion on ~ under the H61der norms given in Kuelbs and Li (1993b). THEOREM 6.8. For any 0 < fi < 1/2 the limit lim~2/(1-2/~)lgP( / sup IW(t)- W(s)] < ~
~-~0 \0_<,,,,<1 It - sl ~ -
~ C~
exists and 0 < c~ < oc.
574
W. lA Li and Q.-M. Shao
Note that no value of cB is known, though reasonable upper and lower bounds are given in Kuelbs and Li (1993b). The existence part of the proof is similar to the one used in de Acosta (1983). In Khoshnevisan and Shi (1998a), the existence of integrated Brownian motion is shown by using the subadditive argument with upper bound on the probability. Related existence results for fractional integrated fractional Brownian motion are given in Li and Linde (1998, 1999) under the sup-norm. For the remainder of this section, we focus on the existence of the small ball constants for the fractional Brownian motion B~(t) under the sup-norm, due independently to Li and Linde (1998) and Shao (1999). The definition of fractional Brownian motion can be found in Section 4.4. The following statement from Li and Linde (1998) also provides the relation with the constant for the selfsimilar Gaussian process Wp(t), fl > 0 given in (6.12). THEOREM 6.9, We have lime2/~ ( ~ - - + o lg P \o_<t_<suPlIB~(t)t
<- e)
(6.10)
= lime2/~log P ( sup ]W~(t)[ < x / ~ " e~ = -C~ , / s--~O \0_<t<l where 0 < Ca < oo,
a~ = o~-1 -}-
/0
oo
((1 -- S) (c~-1)/2 -- ( - - s ) (~ 1)/2)2 d s
(6.11)
and W~(t) =
f0
(t - s) (a-~)/2 dW(s) .
(6.12)
It was proved in Shao (1996) that ~-) <Ca< ~
(10)2
for0<e<l
The existence of the constant in (6.10) was explicitly asked in Talagrand (1996) in connection with the integral test established in that paper. The constants Ca in (6.10) also play the role of the principle eigenvalues of certain operators in the proper domain. Note that (6.10) can be rewritten as
b-+OO
lira t - l l o g p ( r > t) = -Ca ,
(6.13)
where the exit time z = inf{t:B~(t)~_ [-1, 1]}. In the Brownian motion case, = 1, the constant in (6.13) is principle eigenvalue of the Laplacian on the domain [-1, 1]. For other results related to (6.10), see Theorem 6.10 and (7.36). Next we discuss three and a half different proofs available for the existence of the small ball constants for Ba(t) under the sup-norm.
Gaussian processes." Inequalities, small bail probabilities and applications
575
The proof given in Li and Linde (1998) is based on the following useful representation when c~ 1 [see Mandelbrot and Van hess (1968)],
B~(t) = ~#dT(W~(t) +Z~(t)),
0< t< 1 ,
(6.14)
where as is given in (6.11), W~(t) is given in (6.12) and

Z~(t)
=
f0
{(t -
s ) (~ 1)/2 _
(_s)(~-l)/2}dW(s)
oo
Furthermore, W~(t) is independent of Z~.(t). Observe that the centered Gaussian process W(t) is defined for all fi > 0 as a fractional Wiener integral and W3(t) is the integrated Brownish motion mentioned in Section 3.6. The existence of constants for Wp(t), fi > 0, under the sup-norm follows by using the subadditive argument with upper bound on the probability. The proof given in Shao (1999) is based on the following correlation inequality: There exists d~ > 0 such that P(sup
\O<t<a
IB~(t)l <_x, sup IB~(t) - B~(a)l < y'] / a<t d~P (sup,,0<,_<a IB~(t)l <_ x ) P (sUP\a~t 0 and y > 0. The existence part follows from a modified scaling argument with lower bound on the probability. This approach was first used in an early version of Li and Shao (1999); see the half proof part below for more details. The third proof is in Li (2000) and is based on the representation (6.14) and the Gaussian correlation inequality given in Theorem 2.14. The existence part follows from a refined scaling lemma that allows error terms. Various modifications of this approach are most fruitful since the techniques can be used to show the existence of the limit lime ~log p(llxl[ <_ 8) = -K(H. II,X)
870
for various self-similar Gaussian processes X such as B~(t) and W/~under norms It'll such as sup-norm, Lp-norm and H61der norm, with 0 < ~(ll'll,x) < co and suitable 0 < 7 < co. It seems that the method also works for the Sobolev type norm given in Theorem 4.8 and the Besov norm given in Theorem 4.9, but the details still need to be checked. It should be emphasized that the refined scaling lemma formulated in Li (2000) for the existence of a constant is weaker than all the competing subadditive type results we examined. Furthermore, the estimates used are for the lower bound on the probability, rather than the upper bound when using the subadditive argument mentioned earlier. Now we turn to the " h a l f ' proof given in Li and Shao (1999b). It asserts the existence of the constants under the following weaker Gaussian correlation conjecture:
576 //(A1 AA2) > c~ 2
for any 1~(Ai) = c~, i = 1,2. Our early version of the paper in 1997 used the Gaussian correlation conjecture (2.6). we hope the " h a l f ' proof sheds light on the conjecture and points out new directions for useful partial results. Finally, we mention the following result given in Kuelbs and Li (2000) as a consequence of (7.36). Note in particular that the constant C~ plays an important role here. THEOREM 6.10. Let p : [0, 1] ~ [0, oo) be a bounded function such t h a t p(t) 2/c~is Riemann integrable on [0, 1]. Then lime21og P ( sup Ip(t)B~(t)J < e) = -C= f01 p(t) 2/c~ at
e~0 \0<t< 1
where Ca is the small ball constant given in (6.10). The small ball estimates for weighted Lp-norm of B~(t) similar to Theorem 6.3 can also be obtained using the techniques in Li (1999c, 2000).
7. Applications of small ball probabilities

We have presented some of the direct consequences or implications of the small ball probability in earlier Sections. In this section we want to point out more applications of the small ball probability to demonstrate the usefulness of this wide class of probability estimates. Many of the tools and inspiration for small ball probability come from these applications. For example, the study of the rate of convergence of Strassen's functional LIL (Section 7.3) leads to the discovery of the precise links with metric entropy; the need to complete the link forces the use of the/-approximation numbers and other approximation quantities. Further, the tools and ideas developed for these applications have also been used in studying other related problems. Finally, we must stress that we limited ourselves to Gaussian and related processes in all the applications mentioned in this section, and there are much more applications for other processes as indicated in the Introduction. In particular, one can use strong approximation theorems [see Cs6rg6 and R6vdsz (1981)] to extend the results to partial sum processes.
7.1. Chung's laws of the iterated logarithm

Let {W(t), 0 < t < 1} be a Brownian motion. It is well-known that by the L6vy (1937) law of the iterated logarithm
I v(t + s) - w(t) l limsup sup = 1 a.s. h~0 0<s<h(2hlog log(i/h)) 1/2

for every 0 _< t < 1, and furthermore by the modulus of continuity
(7.1)
577
Iw(t + s) - w(t)l lim sup sup = 1 a.s. h-~00<~<h0<t<l-h (2h log(1/h)) 1/2
(7.2)
On the other hand, Chung's LIL Chung (1948) gives the lower limit of the convergence rate Iw ( t + - w(t) L liminf sup 1/2 h 0 0<,<h(h/log log(I/h)) x/~ a.s. (7.3)
for every 0 _< t < 1, while Cs6rg6 and R&&z (1981) obtain the modulus of non-differentiability lim inf
hO<s<_h (h/log(i/h))1~ 2
sup
Iw(t + s) - w(t) l
h--*00<t-<l
v~
a.s.
(7.4)
We refer to Cs6rg6 and R6v6sz (1981) for more general results on the increments of the Brownian motion. For a general centered Gaussian process {X(t), t E [0, 1]} with stationary increments, the law of the iterated logarithm remains true under certain regularity conditions. One can refer to the insightful review of Bingham (1986), to Nisio (1967), Marcus (1968) and Lai (1973) for real valued Gaussian processes, and to Cs6rg6 and Cs/tki (1992), Cs6rg6 and Shao (1993) for gP-valued Gaussian processes. A typical result is Ix(t + - x(t) L lim sup sup = h~0 o<~<_ha(h)(21og log(1/h)) U2 1 a.s. and lim sup sup IX(t+s)-X(t)l = 1 a.s. h~0 0<s_<h0<,_<1-hty(h) (2 log(1/h)) 1/2 (7.6) (7.5)
where a(h) = (F_(X(t + h) -X(t))2) 1/2. In particular, (7.5) and (7.6) hold for the fractional Brownian motions of order e E (0, 2). On the other hand, we are interested in Chung type LIL such as (7.3) and (7.4). It is well-known that a key step in establishing a Chung type law of the iterated logarithm is the small ball probability. The following general result gives a precise implication of the small ball probability to Chung's LIL. THEOREM 7.1. Let X = {X(s),s E [0, 1]d} be a centered Gaussian process with stationary increments, that is
0-2(tt -- S]) = [EIX(/) -
X(s)[2
(7.7)
Assume thatY(0) = 0, cr(x)/x ~ is non-decreasing on I0, 1] for some > 0 and that there is 0 < 0 < 2 such that g0<h< 1/2, aZ(2h) <Oct(h) . (7.8)
578 If there exist 0 < cl _< C2

<

OG such that sup IX(s)t_<~(x)) <<_exp{-cl(h/x) d} \ sE[0,h]d (7.9)
e x p { - c 2 ( h / x ) d} < P (
for s o m e 0 < h 0 <
1 and for a n y 0 < x < _ h 0 h < _ h 2,then
1 < lira inf sup~[0, # JX(s) J h-~O a(h(cl/lOg log(1/h))
l/d)
a.s.
(7.10)
liminf sups~[0'hJdIX(s)l < 1 a.s. h-~0 < h ( c 2 / l o g ( 1 / h ) ) l / ~ ) -
(7.11)
Note that (7.10) follows from the right hand side of (7.9) and the subsequence method. To prove (7.11), let M(h)-sup IX(~)l sC[0,h]~
For arbitrary 0 < e < 1, put sk = exp(-kl+~), dk = exp(k l+~ + k~), and
(7.12)
ak = a(sk(c2/log log(1/sk)) 1/d) .

It suffices to show that
liminfM(sk)/ak _< 1 + ~
k~oo
a.s.
(7.13)
To prove (7.13), we use the spectral representation of X, as Monrad and Rootzen (1995) did. In what follows (s, 2) denotes ~ = 1 si2i. It is known from Yaglom (1957) that EX(s)X(t) has a unique Fourier representation of the form
E{X(s)X(t)} = ~ d (ei(s''~) - 1)(e -i(t''t) - 1)A(dZ)+ (s,Bt) .
(7.14)
Here B = (bij) is a positive semidefinite matrix and A(d2) is a nonnegative measure on Nd _ {0} satisfying
f~
.1 + II,~ll2
A(d,~) < o~
Moreover, there exist a centered, complex-valued Gaussian random measure W(d2) and a Gaussian random vector Y which is independent of W such that
X(s) = [ d (ei<s'~)- 1 ) W ( d 2 ) + (Y,s) .

dR
(7.15)
Gaussianprocesses: Inequalities, small ball probabilities and applications
579
The measures W and A are related by the identity ~-{W(A)W(B)} = A(A NB) for all Borel sets A and B in Rd. Furthermore, W(-A) = W(A). It follows from (7.14) and (7.7) that o-2([t - sl) = 2 e ( 1 - cos((t - s, 2}))A (d2) + (t - s,B(t In particular, for 0 < h < 1 and for every i = 1 , 2 , . . . , d r2(h) = 2./~a (1 -cos(h2i))A(d2) + h2bii >> 2 . ~ ( 1 ~ Co s ~h~i ~ ~A ~d~ ~ O (7.16) For0<h< 1 and 1 l/h A ( d 2 ) -< 1 - lsinl j[~,l;~,L>,/h( 1 sinh~2i ) )-A(d2)
f h
: (1 - sin 1)h J~d i;o,l>_,/h./0 (1 - cos(ui))duA(d2) -(1 -
sin 1)h
(1 - cos(u2i))A(d2)du d I~zl>l/h
< 4~2 (h) and hence f~

L>_l/h
A(d2) _< 4d~2(dh) <_4d3er2(h) .
7.17
Similarly, by (7.16)
f2[<_l/h l~12A(dJ.) <_ dh 2 i~=l ~d,,).i,<_l/h(h,~i)2A(d)~)

< 4dh-2 ~d d,]2i[<l/h (1 - cos(h.gi))A(d2)
i=1
_< 4d2h%2(h)
(7.1S)
Define for k = 1,2... and 0 < s < 1, (e i(~';0 - 1) W(d2),
Xk(S ) = ~[ *(dk-l,dk]
(e i{s'2) - 1)W(d~) .
580
Clearly, lim inf M(sk)

k-~oo O" k
<_lira inf supt~[0'~dd IXk(t)l

k--*oo (Tk
+ lim sup suPts['sde [2k(t)[ + lim sup skl[Y[[

k--4oe O'k k--~ee O'k
(7.19)
It is easy to see that lira sup sk ]1Y]]/ak

k--~oo
IIYII limsupsk((log
k---+oo
log(1/sk))l/d/(skc~/a)
1-6
= 0
a.s.
(7.20)
By the Anderson inequality (1955),
P(
sup Ixk(t)l < (1 + \tc[0,~.]d
e~)ak~ > P(M(sk) < (1 +

/
~)(Tk)
Therefore, by (7.9), and (7.12)
k=l
\t~[O,sk]a
oo
k=l
oo
-> E exp{-(1 + e)-dlog log(l/sk)} = oc .

k=l
(7.21)
Since {suPtc[0,,~]d FXk(t)l, k > 1} are independent, by the Borel-Cantelli lemma, it follows from (7.21) that liminf sup
k~c~ tE[0,s~jd
IXk(t)l/ak
_< 1 + e~ a.s.
(7.22)
We next estimate the second term on the right hand side of (7.19). From (7.17), (7.18), and (7.12), we obtain, for 0 < s < sk
Var(2k(s)) =
2[
df~I ~(dk-I ,dk]
(1 -
cos((s, 2)))A(d2)
2 2 2 (sk/(skdk-1)) + 4da(72(sk/(skdk)) < 4d 3skd;,_la

4 d 3 (Skdk<_ S d 4
1) 2~ (72 ( Sk ) q- 4 d 3 (Skdk ) - 2~ (72 (Sk )

.
581
Therefore Var(J~k(t) -- )(k(s)) _< 6~(h) for every 0 < s, t < sk, Is - tl <_ h < s~, where (7.23) 6-2(h)= min(aZ(h), 16d 4
exp(-eU)aZ(sk)). Applying an inequality of Fernique (1975) and the BoreN

Cantelli lemma yields limsup sup
k--*oe O<s<sk
[Xk(s)[/ak = 0
a.s.
(7.24)
This proves (7.13), by (7.19), (7.20), (7.22) and (7.24), as desired. For L6vy fractional Brownian motion we have THEOREM 7.2. Let {X(s), s E [0, 1]d} be a fractional L6vy Brownian motion of order c~ ~ (0, 2). Then we have lim inf (log log(1/h)) ~/(2d)
h-~O
hC~/2
O<s<_h
sup [X(s)l = c
a.s,
(7.25)
for some 0 < c < ~ . The result (7.25) follows immediately from Theorems 5.1 and 7.1 and the zeroone law of Pitt and Tran (1979). In general, it is not too difficult to derive a Chung type law of the iterated logarithm when the small ball probabilities are available. Below is a direct consequence of Theorem 5.5 on Brownian sheet. Other liminf types of LIL for Brownian sheet can be found in Zhang (1996a). THEOREM 7.3. Let
{Xt, t E [0, 1]2} be a Brownian sheet. Then

sup
liminf (log log(I/h)) 2 h-~0 h(log log log(I/h)) for some constant 0 < c < ec.
IXtl = c a.s.
3/2 t~i0,h]2
We end this subsection with the following integral test of Talagrand (1996). THEOREM 7.4. Let {X(t),t ~ [0, 11} be a fractional Brownian motion of order ~ (0,2), and a(t) be a nondecreasing function with a(t)> 1 and a(t)/t ~/2 bounded. Then
P ( sup [X,] < a(t),i.o.)

k sc[O,t] is equal to zero or one according to whether the integral
o ~ a(t)-2/~tp(a(t)t-~/a)dt
is convergent or divergent, where ~(h) = F(sup~[0,1] IX~I _< h).
582
W. V. Li and Q.-M, Shao
The above result for the Brownian motion case, ~ = 1, dates back to Chung (1948).
7.2. Lower limits for empirical processes

Let ul,u2,.., be independent uniform (0, 1) random variables. We define the empirical process U, by
1 Un(s) ~Z(l{ui<-s}i=l - s)
for 0 < s < 1 ,
and the partial sum process Y , by

n
x,(s) = = (l(ui s -
i=1
f o r O < s < 1. We let Y = {~ff(s,t),s >_ O,t > 0} denote a Kiefer process, that is, 2( is a Gaussian process with mean zero and
E{~(Y(s1, t I ) Y ( s 2 , t2)} = (S1 A s2)((tl A t2) -- tit2)
for sl,s2,q,t2 2 0 . It is easy to see that
{(s,t),s
> O,t >_ 0} d--{B(s,t) - tB(s, 1), s,t >>0}
where B is the Brownian sheet. By the celebrated strong approximation theorem of Komlds-Major-Tusnfidy (1975), there exists a Kiefer process d (on a possibly expanded probability space) such that max sup ]~/'i(S)- Y(i,s)] = O(n 1/2(log n) 2)
l <_i<n0_<s<l
a.s.
(7.26)
Hence, many strong limit theorems for the empirical process follow from the corresponding result for the Kiefer process. We give below several Chung type LIL results for the related empirical processes, based on the small ball probabilities presented in previous sections. We refer to Shorack and Wellner (1986) and Cs6rg6 and Horvfith (1993) for the general theory of empirical processes. The first result below is given in Mogulskii (1980) and the second is given in Horv/tth and Shao (1999). Other related work can be found in Shi (1991) and Csfiki (1994) and the references therein. THEOREM 7.5. Let U,(t) be the empirical process defined above. Then liminf(logn_~o~lgn)l/2 0<_t<Isup tU,(t)l = ~
7~
a.s.
583
and liminf(log log n)
f01
U~(t)dt = g
a.s.
THEOREM 7.6. Let -3(~(t) be the partial sum process defined above. Then (log log --n)'/2 m a x IJ~f"i(s)l ~- c 1 a.s. liminf x/~(log log log //)3/2 oSU<Pll<_i<_n and log log n ~ 1 n--+oo max IJfi(s)l 2 ds = c2 liminfn(log log log n) 2 l<_i<_n where cl and c2 are positive finite constants.
a.s.
7.3. Rates of convergence of Strassen's functional LIL

Recall that { W(t) : t > 0} is the standard Brownian motion. Then Hw C C[0, 1], the reproducing Hilbert space generated by the Wiener measure /~ = 5~(W), is the Hilbert space of absolutely continuous functions on C[0, 1] whose unit ball is the set
Kw =
{ /0
f(t) =
1
f'(s)ds, 0 < t < 1 :
/0
If'(s)l 2 ds _< 1
(7.27)
Here the inner product norm of Hw is given by

1/2
If,w= (f
Let
,f'(s)I2 ds)
, f E Hy/ ,
(7.28)
IlfHoo denote
the usual sup-norm on C[0, 1]. If

0 < t< 1 ,
t/n(t) ~- W(nt)/(2nlog logn) 1/2,

lira inf lit/, - f l [ ~ = 0
then Strassen's functional LIL can be considered to consist of two parts:

n-+oc)f ~Kw
(7.29)
and lim[{t/, - f I I o ~ = 0 for all f Kw (7.30)
It is natural to ask for rates of convergence in (7.29) and (7,30). Bolthausen (1978) was the first to examine the rate of convergence in (7.29) and the problem was further investigated by Grill (1987), Goodman and Kuelbs (1991). Finally Grill (1992) and Talagrand (1992) have shown that 0 < limsup(log l o g n ) 2 / 3 ) n f
.700
lit~, - f l l o o < oc .
(7.31)
where the lower bound also follows from a result of Goodman and Kuelbs (1991).
584
When examining a more general formulation of the problem for any Gaussian measure #, the rate is in fact determined by the small ball probability in (3.5). One motivation for proving results about Gaussian random vectors is that once this is accomplished, then one can translate, in some fashion or other, the Gaussian result to a variety of non-Gaussian situations. For example, this is the approach of Strassen (1964), which produced the functional law of the iterated logarithm for Brownian motion and then, via Skorohod embedding, for polygonal processes. Goodman and Kuelbs (1989) also contains polygonal process results obtained by a suitable strong approximation of their analogous Gaussian results. Next we formulate the results in terms of Gaussian random samples. Let X, )(1, X2,... denote an i.i.d, sequence of centered Gaussian random vectors in a real separable Banach space E with norm II"[[. Let the small ball function ~b(~) be given as in (3.5). Since the exact computation of q5 is in practice impossible, we state the following result due to Talagrand (1993), requiring only either an upper or a lower bound for ~b. Consider two continuous and non-increasing functions ~b1 > ~b > ~b2, and assume that for some constants ci > 1, 4)i(e/ci) >_2~bi(e) for all e > 0 small, i = 1,2. THEOREM 7.7. Let gn,i be the unique root of the equation
4)i(e)/e = ~ n / ~ r ,
i = 1, 2
for n large enough, where 0-2 = supl[hllE,<1 ~(ha(X)). Then for some constants C1 and C2, limsup l x / i ~ i n f l l X , / v / 2 1 o g n - f
n-~oo 8n,1
_< C1 < oc _> C2 > 0
a.s. a.s.
f~Kull
limsup lv/iTg~ infllXn/v/21ogn-f

n--+oo ~n,2
f EK~ IJ
Note that the probability estimates needed for the p r o o f of Theorem 7.7 can be viewed as a type of refined large deviation estimates (enlarged balls) for which the small ball rate function 4~ plays a crucial role in the second order term, [see Chapter 7 in Ledoux (1996)]. The result (7.31) follows from Theorem 7.7 by a scaling argument along a suitable exponential subsequence; see Goodman and Kuelbs (1991) for details. The rate of convergence in (7.30) for general Gaussian measure # is also determined by the small ball probability in (3.5) and can be viewed as a Chung type functional law of the iterated logarithm; see next section for details.
7.4. Rates of convergence of Chung type functional LIL

The rate of convergence in (7.30) can also be seen as a functional form of Chung's law of the iterated logarithm given in Cs~iki (1980), and in more refined form in de Acosta (1983), which implies for each f in CI0, 1] that with probability one
585
liminflogn__+o~ logntl~n -fllo~ = { ~/4.+oo(1 -Ilfll2)
-1/2
ifotherwiselflw< 1.
(7.32)
Note that i f f = 0 then (7.32) is just Chung's law of the iterated logarithm given in (7.1) with a time inversion. The proof of (7.32) is based on the small ball probability l o g P ( sup IW(t)l < < _ ~ ~-, -(~2/8)e-2 , / \0<t< 1 the shifting results given in Section 3.1 and variations of well-known techniques in iterated logarithm proofs. The precise rates for various classes of functions f with If]w = 1 can be found in Csfiki (1980, 1989), Grill (1991), Kuelbs et al. (1994), Gorn and Lifshits (1999). When examining more general formulations for Gaussian samples, the rate clearly is determined by the small ball probability in (3.5) and the function f E Ku. The following result is given in Kuelbs et al. (1994). We use the same notations as in the previous section. THEOREM 7.8. Assume that for e > 0 sufficiently small the function ~b satisfies both of the following: e-q<_q~(e) < e ~ for some p > q > 0
and for each e E (0, 1) there is a p > 0 such that

-
_>
I f f E K, and dn is the unique solution of the equation logn(1 - I ( f , dn)) = ~)((21ogn)l/Zd~) where I(f, 6~) is defined in Section 3.1, then with probability one 1 _< liminfd~ 1 X,/(21og logn) 1/2 - f _< 2 .
It is still an open question to find the exact constant in the case of tlfll~ = 1.
7.5. A Wichura type functional law of the iterated logarithm

There are various motivations for extending results classical for Brownian motion to the fractional Brownian motion B~(t), 0 < ~ < 2. It is not only the importance of these processes, but also the need to find proofs that rely upon general principles at a more fundamental level by moving away from crucial properties (such as the Markov property) of Brownian motion. Below we mention an extension of a classical result for Brownian motion to fractional Brownian motion by using pure Gaussian techniques. Let
M~(t) = sup
O<_s<t
t> 0
(7.33)
586 and
Hn(t) = M~(nt)/(Can~/log logn) U2,
t _> 0
(7.34)
where C~ is given in (6.10). In the Brownian motion case, i.e. c~-- 1, it is well known that C1 = rc2/8. Denote by d [ the non-decreasing functions f : [0, oe) ~ [0, oc I with f(0) = 0 and which are right continuous except possibly at zero. Let Ka =
f E M/d:
/0
(f(t)) 2/~ dt_< 1
(7.35)
for c~E (0, 2), and topologize ~ with the topology of weak convergence, i.e. pointwise convergence at all continuity points of the limit. When e = 1, Wichura (1973) proved, in an unpublished paper, that the sequence {Hn(t)} has a deterministic limit set K~ with probability one, yielding a functional analogue of Chung's LIL for the maximum of the absolute value of Brownian motion. Wichura obtained his result for Brownian motion as a special case of a result for the radial part of a Bessel diffusion. The key probability estimate in Wichura's work follows from a computation of the Laplace transform of the first passage time of the radial part of the Bessel diffusion. The first passage time process in his setting has independent increments, and hence is relatively easy to study. Very recently, the following is proved in Kuelbs and Li (2000) based on the Gaussian nature of these processes. THEOREM 7.9. The sequence {Hn(t)} is relatively compact and all possible subsequential limits of {Hn(t)} in the weak topology is K~ given in (7.35). One of the key steps in the proof is the small ball estimate lim82/~logP(aie <_ sup IBa(s)] < bie,
e~O m = -Ca )2(tii=1 m
O<S<Ii
1 < i < rn~ / (7.36)
ti_l)/b~/~
m
for any fixed sequences {t/}i=0, {ai}i=0 and {bi}m_0 such that 0 = to < tl < and 0 < al < bl <_ a2 < b2 <_ ...ai < bi ~ ai+l ' ' " ~_ am < b m , where Ca is given in (6.10). 7.6. Fractal geometry for Gaussian random fields
<tin
There has been much effort devoted to the study of fractal properties of random sets generated by various stochastic processes. The development of the techniques used in early papers were surveyed in Taylor (1986) and more recent results
Gaussian processes: inequalities, small ball probabilities and applications
587
associated to Gaussian random fields were surveyed in Xiao (1997). The crucial ingredient in most of the early works is the strong Markov property. For the (N, d, e)-Gaussian random fields Y(t; N, d, e) from R N to R d with
[lz([y(t;N,d,e) - g(s;N,d,e)[ 2) = d l t - sl ~ ,
also called L6vy's multiparameter fractional Brownian motion, the small ball probability
~_ l o g P ( s u p l Y ( t ; N , l , e ) l \lt[_<l
< e~ ~_ - e -2N/~ /
(7.37)
plays an important role, where 0 < e < 2 and ]-] denotes the Euclidean norm. The upper bound is proved essentially in Pitt (1978) and the lower bound follows from Theorem 3.8. The exact Hausdorff measure of the image set of Y(t; N, d, e) in the transient case (2N < de) was given in Talagrand (1995), with a significantly shorter proof of previous known special cases, by using the small ball estimates (7.37) and other related techniques. Later in Talagrand (1998), the multiple points of the trajectories Y(t;N,d, e) were studied. The key to the success is the use of a direct "global" approach to obtain a lower bound for a certain sojourn time which is a small ball type estimate. Furthermore, the detailed arguments rely heavily on the chaining argument and the Khatri-Sidak lemma as in the proof of Theorem 3.8 mentioned in Section 3.4. Recently, Xiao (1996, 1998) has studied various other aspects of the fractal structure of fractional Brownian motions using various small ball type estimates.
7.7. Metric entropy estimates

The precise links between the small ball probability and the metric entropy given in Section 3.2 allow one to establish new results about the metric entropy of the various sets K~ in instances when one knows the behaviour of qS(e). Two examples from Kuelbs and Li (1993a) are given below. The first one relates to the unit ball Kw, given in (7.27), which is generated by the Wiener measure. In this case, the small ball function q~(e) is known very precisely for the L2-norm 11"112and the sup-norm 11.11o~on C[0, 11; see Theorem 6.3. Thus the key relations (3.8) and (3.9) can yield the following correspondingly precise estimates of H(s, K~v) which are much sharper than those known before. PROPOSITION 7.1. If Kw is as in (7.27), then for each 8 > 0 as e --+ 0 (1 - 6)(2 - v/3)/4 _< ~ .H(s, Kw, [1"[[2) ~ 1 + 8 and (1 - 8 ) ( 2 - v~)7c/4 _< ~.H(~,K, H'[[~) -< ~(1 + 8) . (7.39) (7.38)
588
Note that (7.38) is more precise than what is given in Theorem XVI of Kolmogorov and Tihomirov (1961), and for (7.39) there are no constant bounds in Birman and Solomjak (1967). The second is related to the small ball probability for the 2-dimensional Brownian sheet (see Section 5.2), and it was first solved in Talagrand (1993). The unit ball of the generating Hilbert space for the 2-dimensional Brownian sheet is
K=
f E C([O, 1]2) : f(s, t ) = <_ 1
/0/0
}.
g(u,v)dudv,
.~01f01g2(u,v)dudv
HGK, I111oo) e-l(log
Thus combining Theorem 5.4 and Theorem 3.3 implies 1/e) 3/2 ,
which solves an interesting problem left open in approximation theory. Later, a direct proof inspired by this line of argument was given in Temlyakov (1995). Further research in this direction can be found in Dunker (1998) and for dimensions bigger than two, the analogous problem remains open, as well as the corresponding Brownian sheet problem discussed in detail in Section 5.2.
7.8. Capacity in Wiener space

Let {WE(t) : t > 0} be an E-valued #-Brownian motion as given before Theorem 6.7. Define the E-valued #-Ornstein-Uhlenbeck process O by
O(t) = e-t/2WE(et),
t >_ 0 .
Note that 0 is also an E-valued stationary diffusion whose stationary measure is #. Let Q(.) be a continuous semi-norm on E and define the 2-capacity of the "ball" {x c E : Q(x) < e} by Capz(8)=
/0
e ~rp
inf Q(O~t)_<e dT, \O<t<_T
8>0
We refer the readers to Osttinel (1995) and Fukushima et al. (1994) for details. Then the following result is given in Khoshnevisan and Shi (1998b). THEOREM 7.10. Suppose Q is a nondegenerate, transient semi-norm on E. Then, for all 2 > 0 and all tc > 1, there exists a constant e E (1, ec) such that for all
8 c (0, l / c ) ,
#(x: Q(x) <_ 8). <_ Cap;(e) < c#(x : Q(x) <<8)
c82 _
589
where 2O(e;~c) ----sup{a > 0:/~(x: is the approximate inverse to #(x:
Q(x) <_a) <_ ~c#(x: Q(x) <
e)},
e> 0
Q(x) <_ 5).
It turns out that, under very general conditions, 2Q(e; ~c) - e has polynomial decay rate. Thus, Theorem 7.10 and its relatives given in Khoshnevisan and Shi (1998b) provide exact (and essentially equivalent) asymptotics between the 2-capacity of a small ball and the small ball probability #(x : Q(x) <_5). But since J~Q(e; t) is much harder to find directly, the results are in fact applications of small ball probabilities to the 2-capacity.
7.9. Natural rates of escape for infinite dimensional Brownian motions

For any stochastic process {X(t), t > 0} taking values in a real Banach space with norm It'll, a nondecreasing function 7 : [0, co) ~ (0, oc) is called a natural rate of escape for X if liminfIlx(t)[I/7(t ) = 1 as t ~ oo with probability one. The following result and related definitions of a vector valued Brownian motion and an admissible function are given in Erickson (1980). THEOREM 7.11. Let X be a genuinely d-dimensional Brownian motion on a Banach space (E, LItl) with 3 < d < oc, and let h be an admissible function. Fix b > 1 and put 7(t) = tl/2h(t). Then
liminfllX(t)ll/y(t )~ 1 t-~ O0
depending on whether
a.s.
Zh(bk)-zP(llX(1)l ] < h(bk))

k>_l --
~ converges
k
diverges
(7.40)
Since h is nonincreasing and slowly varying at infinity, we see that (7.40) completely depends on the small ball probability p(llx(1)l I < 5) as e -+ o. A natural conjecture mentioned in Erickson (1980) is that if X is genuinely infinite dimensional Brownian motion and if P([lx(t)ll < ~) > 0 for all e > 0, t > 0, then
0 < liminfl[X(t)ll/7(t ) < oo

t---~OO
(7.41)
for some 7(t); see also Cox (1982) for some related works. Note that (7.41) does not hold in the finite dimensional setting by the Dvoretzky-Erdos test given in Dvoretzky and Erd6s (1951). The difference between the infinite and finite dimensional cases is that P(llX(1)ll _< 5) is o(~n) as e --+ 0 for all n when X is genuinely infinite dimensional, whereas P(IIX(1)]I < e) o(e") for n _> d when X is d-dimensional with d < oo.
590
7.10. Asymptotic evaluation of Laplace transform for large time

When proper scaling properties hold for certain functionals of a given random process, small ball estimates are equivalent to asymptotic evaluation of Laplace transform for large time via the exponential Tauberian theorem given in Theorem 3.5. Here we present two more examples. The first one is the consequence of (6.9) from which lim 2 -2/(2+p) log ~ exp - 2
2~o
/ J0
saP
ds
12 +
2+p2~p
. ',
p/22/pPV'/12+"I r
by the exponential Tauberian theorem given in Theorem 3.5. Now by using the scaling property of Brownian motion and (6.2)
lim t-(2+p-2fl)/(2+p)log lzexp{_,~ ft ]W(s)'P ds}

t~oo Jo sfi
2 + p - 2~ 21
2 -]- P ,4
(p),~2/(2+p)
for fl < (2 + p ) / 2 and 2 > 0. The second one is the consequence of (6.10) from which
lg ~zexp~-2t.0<t<,supIBa(t)l} ~ -(2 + ~)(Ca/~)a/(2+~)(2/2)2/(2+~)

as 2---+ oo by the exponential Tauberian theorem given in Theorem 3.5. Now by the scaling property of fractional Brownian motion Ba(t), i.e. {B~(at), t >_ 0} = {a~/2B~(t), t > 0} in law,
t~oo
lim t-a/(2+~)log ~ e x p { - 2 sup
O<s<t
IBa(s)l}
= - (2 + c~)(Ca/7)a/(2+a)(2/2)2/(2+a) for 0 < ~ < 2 and any 2 > 0. Note that Ca is the small ball constant given in
(6.10).
7.11. Onsager-Maehlup functionals
For any measure v on a metric space E with metric d(., .), the Onsager Machlup function is defined as
F(a, b)
= log [lira -7-
v(x:d(x, a)
(7.42)
if the above limit exists.
591
For the Gaussian measure, the existence of (7.42) and related conditional exponential moments are studied in Shepp and Zeitouni (1992), Bogachev (1995) and Ledoux (1996). Both correlation type inequalities and small ball probabilities play an important role in the study, In Capitaine (1995), a general result in the Cameron-Martin space for diffusions is proved for rotational invariant norms with known small ball behavior, including in particular H61der norms and Sobolev type norms. Other related work can be found in Carmona and Nualart (1992) and Chaleyat-Maurel and Nualart (1995).
7.12. Random fractal laws of the iterated logarithm

Let {W(t) : t _> 0} denote a standard Wiener process, and for any ~/E [0, 11, set E~ = ~t ~ E [0, 1): limsup(2hlog(1/h))-l/2(W(t + h ) - W(t)) > ~} .
h~0
(7.43) Orey and Taylor (1974) proved that E, is a random fractal and established that with probability one the Hausdorff dimension of this set is given by dim(E,) = 1 - I/2. Recently Deheuvels and Mason (1998) show that one can derive the following functional refinement of (7.43). We use notations given in Sections 7.3 and 7.4, and in particular, the Strassen set Kw and the inner product norm I'[v/are given in (7.27) and (7.28). THEOREM 7.12. For each f c Kw and c > 1, let E(f, c) denote the set of all t c [0, 1] such that liminf I loghl 1l(2hl loghl)-l/2~(h, t, .)
h~0
-fll~
< c2-1/2 8~/2 (1 -iNlay) -1/2

where (h,t,s) : W(t+hs) - W(t). Then for [flw < 1, d i m E ( f , c ) = (1 - I f l ~ ) (1 - c -2) with probability one. A key in the proof is the small ball estimate given in de Acosta (1983) discussed in Section 7.4. The case Ifqw= 1 is also related to Section 7.4. Very recently, Khoshnevisan et al. (1999) present a general approach to many random fractals defined by limsup operations. In particular, their result yields extensions of Theorem 7.12 by applying appropriate small ball estimates.
Acknowledgements
The authors would like to thank Miklds Cs6rg6, Davar Khoshnevisan, Jim Kuelbs, Michael Marcus, Ken Ross for their helpful comments and suggestions about this paper. Wenbo V. Li's study was supported in part by NSF Grant DMS-9972012. Qi-Man Shao's study was supported in part by NSF Grant DMS-9802451.
592
References
Anderson, T. W. (1955). The integral of a symmetric convex set and some probability inequalities. Proc. Amer. Math. Soc. 6, 170-176. Baldi, P. and B. Roynette (1992). Some exact equivalents for the Brownian motion in H61der norm. Prob. Theory Related Fields 93, 457M84. Bass, R. F. (1988). Probability estimates for multiparameter Brownian processes. Ann. Prob. 16, 251-264. Belinsky, E. (1998). Estimates of entropy numbers and Gaussian measures for classes of functions with bounded mixed derivative. J. Approx. Theory 93, 114-127. Berman, S. M. (1985). An asymptotic bound for the tail of the distribution of the maximum of a Gaussian process with stationary increments. Ann. Inst. H. Poincard 21, 47-57. Berthet, P. and Z. Shi (1998). Small ball estimates for Brownian motion under a weighted sup-norm. (preprint). Bingham, N. H. (1986). Variants on the law of the iterated logarithm. Bull. London Math. Soc. 18, 433-467. Bingham, N. H., C. M. Goldie and J. L. Teugels (1987). Regular Variation. Cambridge Univ. Press, Cambridge. Birman, M. S. and M. Z. Solomjak (1967). Piecewise polynomial approximation of the functions of the classes Wp%Math. USSR-Sbornik 2, 295-317. Bogachev, V. I. (1995). Onsager-Machlup functions for Gaussian measures. Dokl. Akad. Nauk 344, 439441. Bogachev, V. I. (1998). Gaussian Measures. Amer. Math. Soc., Mathematical Surveys and Monographs, Vol. 62. Bolthausen, E. (1978). On the speed of convergence in Strassen's law of the iterated logarithm. Ann. Prob. 6, 668-672. Borell, C. (1975). The Brunn Minkowski inequality in Gauss space. Invent. Math. 30, 207-216. Boreli, C. (1976). Gaussian Radon measures on locally convex spaces. Math. Scand. 38, 265-285. Borovkov, A. and A. Mogulskii (1991). On probabilities of small deviations for stochastic processes. Siberian Advances in Mathematics 1, 39-63. Capitaine, M. (1995). Onsager Machlup functional for some smooth norms on Wiener space. Prob. Theory Related Fields 102, 189-201. Carmona, R. and D. Nualart (1992). Traces of random variables on Wiener space and the OnsagerMachlup functional. J. Funct. Anal. 107, 402~438. Chaleyat-Maurel, M. and D. Nualart (1995). Onsager-Machlup functionals for solutions of stochastic boundary value problems. Lecture Notes in Math. 1613, 44-55, Springer, Berlin. Chen, X. and W. V. Li (1999). Small ball probabilities for primitives of Brownian motion. (preprint). Chung, K. L. (1948). On the maximum partial sums of sequence of independent random variables. Amer. Math. Soc. 64, 205-233. Cox, D. D. (1982). On the existence of natural rate of escape functions for infinite dimensional Brownian motions. Ann. Prob. 10, 623-638. Csfiki, E. (1980). A relation between Chung's and Strassen's law of the iterated logarithm. Z. Wahrsch. verw. Gebiete 54, 287-301. Csfiki, E. (1982). On small values of the square integral of a multiparameter Wiener process. Statistics and Probability. Proe. of' the 3rd Pannonian Syrup. on Math. Stat. D. Reidel Publishing Company, pp. 19-26. Csfiki, E. (1989). A lim inf result in Strassen's law of the iterated logarithm. Colloquia Math. Soc. J. Bolyai 57, Limit theorems in probability and statistics, pp. 83-93. Csfiki, E. (1994). Some limit theorems for empirical processes. In Recent Advances in Statistics and Probability (Proc. 4th IMSIBAC, Eds., J. P. Vilaplana and M. L. Purl), pp. 247-254. VSP, Utrecht. Csfiki, E. and M. Cs6rg6 (1992). Inequalities for increments of stochastic processes and moduli of continuity. Ann. Prob. 20, 1031-1052.
593
Cs6rg6, M. and L. Horvfith (1993). Weighted Approximations in Probability and Statistics. John Wiley & Sons, New York. Cs6rg6, M. and P. R+v~sz (1978). How small are the increments of a Wiener process? Stoch. Process Appl. 8, 119-129. CsSrg6, M. and P. R~v6sz (1979). Strong Approximations in Probability and Statistics. Academic Press, New York. CsSrg6, M. and Q. M. Shao (1993). Strong limit theorems for large and small increments of P-valued Gaussian processes. Ann. Prob. 21, 1958 1990. Cs6rg6, M. and Q. M. Shao (1994). On almost sure limit inferior for B-valued stochastic processes and applications. Prob. Theory Related Fields 99, 29-54. de Acosta, A. (1983). Small deviations in the functional central limit theorem with applications to functional laws of the iterated logarithm. Ann. Prob. 11, 78-101. Deheuvels, P. and D. Mason (1998). Random fractal functional laws of the iterated logarithm. Studia Sci. Math. Hungar. 34, 8%106. Dembo, A., E. Mayer-Wolf and O. Zeitouni (1995). Exact behavior of Gaussian seminorms. Stat. Prob. Lett. 23, 275 280. Dembo, A. and O. Zeitouni (1998). Large deviations techniques and applications. 2rid ed. Springer, New York. Deuschel, J. and D. Stroock (1989). Large Deviations. Academic Press, Boston. Donsker, M. D. and S. R. S. Varadhan (1975a). Asymptotic evaluation of certain Wiener integrals for large time. Functional integration and its applications (Proc. Interna. Conf.), pp. 15 33. Donsker, M. D. and S. R. S. Varadhan (1975b). Asymptotics for the Wiener Sausage. Comm. Pure Appl. Math. 28, 525-565. Doob, J. L. (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math. Stat. 20, 393403. Dudley, R. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Funct. Anal. 1,290-330. Dudley, R. M., J. Hoffmann Jorgensen and L. A. Shepp (1979). On the lower tail of Gaussian seminorms. Ann. Prob. 9, 319 342. Dunker, T. (1998). Small ball estimates for the fractional Brownian sheet. PhD Thesis, F. Schiller Universitfit, Jena. Dunker, T. (1999). Estimates for the small ball probabilities of the fractional Brownian sheet. J. Theoret. Prob. (to appear). Dunker, T., T. Kuhn, M. A. Lifshits and W. Linde (1998). Metric entropy of integration operator and small ball probabilities of the Brownian sheet. C. R. Acad. Sci. Paris 326, 347 352. Dunker, T., T. Kuhn, M. A. Lifshits and W. Linde (1999). Approximation numbers and /-norm. (preprint). Dunker, T., W. V. Li and W. Linde (1999). Small ball probabilities of integrals of weighted Brownian motion. Stat. Prob. Lett. (to appear). Dunker, T., M. A. Lifshits and W. Linde (1998) Small deviations of sums of independent variables. Progress in Probability 43, 59 74, Birkhauser. Dvoretzky, A. and P. ErdSs (1951). Some problems on random walk in space. Proc. Second Berkely Symp. Math. Star. Prob. 353 367. Ehrhard, A. (1983). Sym6trisation dans l'espace de Gauss. Math. Scand. 53, 281-301. Ehrhard, A. (1984). In~qalitbs Isop6rim+triques et int6grales de Dirichlet Gaussiens. Ann. Seient E,c Norm. Sup. 17, 317 332. Ehrhard, A. (1986). E16ments extr6maux pour les in6galit~s de Brunn-Minkowski gaussiennes. Ann. H. Poincard 22, 149 t68. Erickson, K. (1980). Rates of escape of infinite dimensional Brownian motion. Ann. Prob. 8, 325 338. Fernique, X. (1975). R6gularit6 des trajectoires des fonctions al6atoires gaussiennes, l~cole I~t6 de Probabilit6s de St.-Flour. Lecture Notes in Math. 480, 1 96. Fukushima, M., Y. Oshima and M. Takeda (1994). Dirichlet Forms and Symmetric Markov Processes. Walter de Gruyter, New York.
594
Goodman, V. (1990). Some probability and entropy estimates for Gaussian measures. Progress in Probability 20, 150-156, Birkhauser. Goodman, V. and J. Kuelbs (1989). Rates of convergence for the functional LIL. Ann. Prob. 17, 301-316. Goodman, V. and J. Kuelbs (1991). Rates of clustering in Strassen's LIL for Brownian motion. J. Theory Prob. 4, 285 309. Gordon, Y. (1985). Some inequalities for Gaussian processes and applications. Israel J. Math. 50, 265-289. Gorn N. L. and M. A. Lifshits (1999). Chung's law and Csitki function. J. Theoret. Prob. 12, 399-420. Grill, K. (1987). On the rate of convergence in Strassen's law of the iterated logarithm. Prob. Related Fields 74, 583-589. Grill, K. (1991). A lim inf result in Strassen's law of the iterated logarithm. Prob. Theory Related Fields 89, 149-157. Grill, K. (1992). Exact rate of convergence in Strassen's law of the iterated logarithm. J. Theoret. Prob. 5, 197-204. Gross, L. (1970). Lectures in modern analysis and applications II. Lecture Notes in Math. Vol. 140, Springer, Berlin. Das Gupta, S., M. Eaton, I. Olkin, M. Perlman, L. Savage and M. Sobel (1972). Inequalities on the probability content of convex regions for elliptically contoured distributions. Proc. Sixth Berkeley Syrup. Math. Slat. Prob. 3, 241-264. Harg6, G. (1998). Une in6galit6 de d6corr61ation pour la mesure gaussienne. C. R. Acad. Sei. Paris S~r. 326, 1325-1328. Hitczenko, P., S. Kwapiefi, W. V. Li, G. Schechtman, T. Schlumprecht and J. Zinn (1998). Hypercontractivity and comparison of moments of iterated maxima and minima of independent random variables. Electronic Journal o f Probability Vol. 3, Paper no. 2, 1~6. Horvfith, L. and Q. M. Shao (1999). Liminf for weighted empirical processes. (in preparation). Hu, Y. (1997). Ito-Wiener chaos expansion with exact residual and correlation, variance inequalities. J. Theoret. Prob. 10, 835-848. Ibragimov, I. A. (1982). On the probability that a Gaussian vector with values in a Hilbert space hits a sphere of small radius. J. Sov. Math. 20, 2164-2174. Jogdeo, K. (1970). A simple proof of an inequality for multivariate normal probabilities of rectangles. Ann. Math. Stat. 41, 1357-t359. Kac, M. (1946). On the average of a certain Wiener functional and a related limit theorem in calculus of probability. Trans. Amer. Math. Sac. 59, 401-414. Kac, M. (1951). On some connections between probability theory and differential and integral equations. Proc. Second Berkeley Symposium on Math. Stat. and Probability 189-215, University of California Press, Berkeley. Khatri, C. G. (1967). On certain inequalities for normal distributions and their applications to simultaneous confidence bounds. Ann. Math. Stat. 38, 1853-1867. Khoshnevisan, D., Y. Peres and Y. Xiao (1999). Limsup random fractals. (preprint). Khoshnevisan, D. and Z. Shi (1998a). Chung's law for integrated Brownian motion. Trans. Amer. Math. Sac. 350, 4253-4264. Khoshnevisan, D. and Z. Shi (1998b). Gaussian measure of a small ball and capacity in Wiener space. Asymptotic methods in Probability and Statistics 453-465, North-Holland, Amsterdam. Kolmogorov, A. N. and V. M. Tihomirov (1961). e-entropy and z-capacity of sets in function spaces. Amer. Math. Sac. Transl. 17, 277-364. Koml6s, J., P. Major and G. Tusnfidy (1975). An approximation of partial sums of independent rv's and the sample distribution function. I. Z. Wahrseh. verw. Gebiete 32, 111-13l. Kuelbs, J. (1976). A strong convergence theorem for Banach space valued random variables. Ann. Prob. 4, 744~771. Kuelbs, J. and W. V. Li (1993a). Metric entropy and the small ball problem for Gaussian measures. J. Funct. Anal. 116, 133 157. Kuelbs, J. and W. V. Li (1993b). Small ball estimates for Brownian motion and the Brownian sheet. J. Theoret. Prob. 6, 547-577.
595
Kuelbs, J. and W. V. Li (1995). An extension of Ehrhard's Theorem. Interaction Between Functional Analysis, Harmonic Analysis, and Probability. Lecture notes in pure and applied mathematics. Vol. 175, 291-300. Kuelbs, J. and W. V. Li (1998). Some shift inequalities for Gaussian measures. Progress in Probability 43, 233~43, Birkhauser. Kuelbs, J. and W. V. Li (2000). Wichura type functional law of the iterated logarithm for fractional Brownian. (in preparation). Kuelbs, J., W. V. Li and W. Linde (1994). The Gaussian measure of shifted balls. Prob. Theory Related Fields 98, 143-162. Kuelbs, J., W. V. Li and Q. Shao (1995). Small ball estimates for fractional Brownian motion under H61der norm and Chung's functional LIL. J. Theoret. Prob. 8, 361-386. Kuelbs, J., W. V. Li and M. Talagrand (1994). Lira inf results for Gaussian samples and Chung's functional LIL. Ann. Prob. 22, 1879-1903. Kwapiefi, S. (1994), A remark on the median and the expectation of convex functions of Gaussian vectors. Progress in Probability 35, 271-272, Birkhauser, Boston. Kwapiefi, S. and J. Sawa (1993). On some conjecture concerning Gaussian measures of dilatations of convex symmetric sets. Studia Math. 105, 173-i87. Lai, T. L. (1973). Gaussian processes, moving averages and quick detection problems. Ann. Prob. 1, 825-837. Landau, H. J. and L. A. Shepp (1970). On the supremum of a Gaussian process. Sankhya 32, 369-378. Latala, R. (1996). A note on the Ehrhard's inequality. Studia Math. 118, 16%174. Latata, R. and K. Oleszkiewicz (1999). Gaussian measures of dilatations of convex symmetric sets. (preprint). Ledoux, M. (1996). Isoperimetry and Gaussian Analysis. Lectures on Probability Theory and Statistics. Lecture Notes in Math. 1648, 165-294, Springer-Verlag. Ledoux, M. and M. Talagrand (1991). Probability on Banach Spaces. Springer, Berlin. L6vy, P. (1937). Theorie de l'Addition des Variables Aleatoires. Gauthier-Villars, Paris. Lewis, T. and G. Pritchard (1999). Correlation measures. Elect. Comm. Prob. 4, 77-85. Li, W. V. (1992a). Comparison results for the lower tail of Gaussian seminorms. J. Theoret. Prob. 5, 1-31. Li, W. V. (1992b). Lira inf results for the Wiener process and its increments under the L2-norm. Prob. Theory and Related Fields 92, 69 90. Li, W. V. (1992c). On the lower tail of Gaussian measures on lp. Progress in Probability 30, 106~117, Birkhauser. Li, W. V. (1998a). A lira inf result for the Brownian motion. Asymptotic methods in Probability and Statistics 281-292, (ICAMPS'97, Ed., B. Szyszkowicz), North-Holland, Amsterdam. Li, W. V. (1999a). A Gaussian correlation inequality and its applications to small ball probabilities. Elect. Comm. Prob. 4, 111-118. Li, W. V. (1999b). Small deviations for Gaussian Markov processes under the sup-norm. J. Theoret. Prob. 12, 971-984. Li, W. V. (1999c). Small ball estimates for Gaussian Markov processes under the/p-norm. (preprint). Li, W.V. (2000). The existence of small ball constants for Gaussian processes under various norms. (in preparation). Li, W. V. and W. Linde (1993). Small ball problems for non-centered Gaussian measures. Prob. Math. Stat. 14, 231~51. Li, W. V. and W. Linde (1998). Existence of small ball constants for fractional Brownian motions. C.R. Acad. Sci. Paris 326, 1329 1334. Li, W. V. and W. Linde (1999). Approximation, metric entropy and small ball estimates for Gaussian measures. Ann. Prob. 27, 1556-1578. Li, W. V. and Q. M. Shao (1999a). Small ball estimates for Gaussian processes under the Sobolov norm. J. Theoret. Prob. 12, 699-720. Li, W. V. and Q. M. Shao (1999b). A note on the Gaussian correlation conjecture. (to appear). Lifshits, M. A. (1995). Gaussian Random Functions. Kluwer Academic Publishers, Boston.
596
Lifshits, M. (1999). Asymptotic behavior of small ball probabilities. In Probability Theory and Mathematical Statistics. Proceedings of Seventh Vilnius Conference, pp. 453-468. TEV and VSP editions. Lifshits, M. A. and W. Linde (1999). Entropy numbers of Volterra operators with application to Brownian motion. (preprint). Lifshits, M. A. and B. S. Tsyrelson (1986). Small ball deviations of Gaussian fields. Theoret. Prob. Appl. 31, 55%558. Mandelbrot, B. and J. Van Ness (1968). Fractional Brownian motions, fractional noises and applications. SIAM Review 10, 422-437. Marcus, M. (1968). H6Ider conditions for continuous Gaussian processes. Trans. Amer. Math. Soc. 134, 2%52. Marcus, M. (1970) A bound for the distribution of the maximum of continuous Ganssian processes. Ann. Math. Stat. 41, 305-309. Marcus, M. and L. Shepp (1972). Sample behavior of Gaussian processes. Proc. Sixth Berkeley Syrup. Math. Stat. Prob. 2, 423~441. Mogulskii, A. A. (1974). Small deviations in space of trajectories. Th. Prob. Appl. 19, 726-736. Mogulskii, A. A. (1980). On the law of the iterated logarithm in Chung's form for functional spaces. Theoret. Prob. Appl. 24, 405-413. Monrad, D. and H. Rootzen (1995). Small values of Gaussian processes and functional laws of the iterated logarithm. Prob. Theory Related Fields 101, 173-192. Nisio, M. (1967). On the extreme values of Gaussian processes. Osaka J. Math. 4, 313-326. Orey, S. and S, Taylor (1974). How often on a Brownian motion path does the law of iterated logarithm fail? Pro. Lon. Math. Soc. 28, 174-192. Pisier, G. (1989). The Volume of Convex Bodies and Banach Space Geometry. Cambridge Univ. Press, Cambridge. Piterbarg, V. I. (1982). Gaussian random processes. [Progress in Science and Technology] Teor. Veroyatnost. Mat. Stat. Teor Kibernet. 9, 155 198. Pitt, L. (1977). A Gaussian correlation inequality for symmetric convex sets. Ann. Prob. 5, 470-474. Pitt, L. (1978). Local times for Gaussian vector fields. Indiana Univ. Math. J. 27, 309-330. Pitt, L. and L. T. Tran (1979). Local sample path properties of Gaussian fields. Ann Prob. 7, 477-493. Schechtman, G., T. Schlumprecht and J. Zinn (1998). On the Gaussian measure of the intersection. Ann. Prob. 26, 346-357. Shao, Q. M. (1993). A note on small bail probability of Gaussian processes with stationary increments. J. Theoret. Prob. 6, 595-602. Shao, Q. M. (1996). Bounds and estimators of a basic constant in extreme value theory of Gaussian processes. Star. Sinica 7, 245457. Shao, Q. M. (1999). A Gaussian correlation inequality and its applications to the existence of small bali constant. (preprint). Shao, Q. M. and D. Wang (1995). Small ball probabilities of Gaussian fields. Prob. Theory Related Fields 102, 511-517. Shepp, L. and O. Zeitouni (1992). A note on conditional exponential moments and Onsage~Machlup functionals. Ann. Prob. 20, 652-654. Shi, Z. (1991). A generalization of tile Chung-Mogulskii law of the iterated logarithm. J. Multi. Anal. 37, 269-278. Shorack, G. R. and J. A. Wellner (1986). Empirical Processes with Applications to Statistics. John Wiley & Sons, New York. Sidak, Z. (1967). Rectangular confidence regions for the means of multivariate normal distributions. J. Amer. Star. Assoc. 62, 626-633. Sidak, Z. (1968). On multivariate normal probabilities of rectangles: Their dependence on correlations. Ann. Math. Stat. 39, 1425 1434. Slepian, D. (1962). The one-sided barrier problem for Gaussian noise. Bell System Tech. J. 41,463-501. Smirnov, N. V. (1937). On the distribution of co2-criterion of Mises. Math, Sbornik 2, 973 993.
597
Stolz, W. (1993). Une m6thods 616mentaire pour l'6valuation de petites boules Browniennes. C.R. Acad. Sci. Paris 315, 1217 1220. Stolz, W. (1996). Some small ball probabilities for Gaussian processes under nonuniform norms. J. Theoret. Prob. 9, 613-630. Strassen, V. (1964). An invariance principle for the law of the iterated logarithm. Z. Wahrs. verw. Gebiete 3, 211-226. Sytaya, G. N. (1974). On some asymptotic representation of the Gaussian measure in a Hilbert space. Theory of Stochastic Processes. Ukrainian Academy of Sciences 2, 93-104. Sudakov, V. N. (1969). Gaussian measures, Cauchy measures and e-entropy. Soviet Math. Dokl. 10, 310-313. Sudakov, V. N. and B. S. Tsirelson (1974). Extremal properties of half-spaces for spherically invariant measures. J. Sov. Math. 9, 9-18, (1978); translated from Zap. Nauch. Sem. L.O.M.I. 41, 14-24. Szarek, S. and E. Werner (1999). A nonsymmetric correlation inequality for Gaussian measure. J. Multi. Anal. 68, 193-211. Talagrand, M. (1992). On the rate of convergence in Strassen's LIL. Progress in Probability, Birkhauser, Boston, pp. 339-351. Talagrand, M. (1993). New Gaussian estimates for enlarged balls. Geometric and Funct. Anal. 3, 502-526. Talagrand, M. (t994). The small ball problem for the Brownian sheet. Ann. Prob. 22, 1331-1354. Talagrand, M. (1995). Hansdorff measure of trajectories of multiparameter fractional Brownian motion. Ann. Prob. 23, 767 775. Talagrand, M. (1996). Lower classes for fractional Brownian motion. J. Thoeret. Prob. 9, 191-213. Talagrand, M. (1998). Multiple points of trajectories of multiparameter fractional Brownian motion. Prob. Theory Related Fields 112, 545-563. Taylor, S. J. (1986). The measure theory of random fractals. Math. Proc. Cambridge Philos. Soc. 100, 383-406. Temlyakov, V. (1995). An inequality for trigonometric polynomials and its application for estimating the entropy numbers. Journal of Complexity 11, 293 307. Tong, Y. L. (1980). Probability Inequalities in Multi-variate Distributions. Academic Press, New York. Ustfinel, A. S. (1995). An Introduction to Analysis on Wiener Space. Springer, Berlin. Varadhan, S. R. S. (1984). Large Deviations and Applications. SIAM, Philadelphia. Vitale, R. (1996). The Wills Functional and Gaussian Processes. Ann. Prob. 24, 2172-2178. Vitale, R. (1999). Majorization and Gaussian Correlation. Stat. Prob. Lett. 45, 247 251. Vitale, R. (1999). A log-concavity proof for a Gaussian exponential bound. Contemporary Math.: Advances in Stochastic Inequalities 234, 209-212. Vitale, R. (1999). Some Comparisons for Gaussian Processes. Proc. Amer. Math. Soc. (to appear). Wichura, M. (1973). A functional form of Chung's Law of the iterated logarithm for maximun absolute partial sums. (unpublished). Xiao, Y. (1996). Hausdorff measure of the sample paths of Gaussian random fields. Osaka J. Math. 33, 895-913. Xiao, Y. (1997). Fractal measures of the sets associated to Gaussian random fields. (preprint). Xiao, Y. (1998). Hausdorff-type measures of the saml: paths of fractional Brownian motion. Stoch. Process. Appl. 74, 251-272. Yaglom, A. M. (1957). Some c l a s ~ v~ random fields in n-dimensional space, related to stationary random processes. Theory Prob. Appl. 2, 273-320. Zhang, L. X. (1996a). Two different kinds of liminfs on the LIL for two-parameter Wiener processes. Stoch. Proc. Their Appl. 63, 175-188. Zhang, L. X. (1996b). Some liminf results on increments of fractional Brownian motions. Acta Math. Hungar. 71, 215-240. Zolotarev, V. M. (1986). Asymptotic behavior of the Gaussian measure in 12. J. Soy. Math. 24, 2330-2334.
I t.3
Point Processes and Some Related Processes
R o b i n K. M i l n e
1. Introduction
Investigations in diverse areas of science lead to data in the form of a countable collection of points distributed in an apparently random manner over some set. Often the data may comprise the times of occurrence of some phenomenon, or the locations at which some objects are observed or some phenomenon occurs. Simple examples of such data are: the times of emission of pulses in a nerve fibre, the times of arrival of patients at an intensive care unit, the locations of trees in a forest, ants nests in a region, or a specified type of particle or cell in a section (or volume) of tissue. More complex examples are: the times of occurrence, locations and magnitudes of earthquakes in a given region; the locations and instantaneous directions of movement of insects, spermatozoa etc. in a region at a specified time. The particular type of stochastic process which is used to model such data is known as a point process. Thus a point process is a mathematical model for describing a countable set of points randomly distributed over some space. Poisson processes provide the simplest and best known type of point process. As Stoyan et al. (1995, p. xiv) have noted, since point processes on the real line were predominantly the earliest considered, use of the term 'process' often reflects just the historical origins of the subject rather than implying any time dependence. The theory of point processes as presently developed is rich in mathematical structure, having many important connections with other areas of probability and stochastic process theory. Its early origins [cf. Daley and Vere-Jones (1988)] lie in the areas of life tables and renewal theory, counting problems beginning with work of S.-D. Poisson and leading to applications in particle physics and population processes, queueing theory and communication engineering. In a significant review paper which attempted to chart future directions in queueing theory, Kendall (1964) said: '/t is clear that progress with the problems mentioned here would be greatly facilitated by a richer theory of point processes . . . . . . . Only so can queueing theory repay the debt it owes to the world of technology which gave it birth.' After explosive growth in the literature of point processes in subsequent years the debt would appear to have been amply repaid, with bonuses to many other areas of application.
599
600
R. K. Milne
In fact there are now even a number of books dealing with various aspects of point processes. Applications to queueing theory are considered especially in Br6maud (1981), Franken et al. (1982), Disney and Kiessler (1987), Baccelli and Br6maud (1987), and Brandt et al. (1990). Khinchin (1969) and its earlier editions had introduced many readers to basic results about stationary point processes on the real line, as well as applications to queueing systems and loss systems. Other related ideas were presented in Gnedenko and Kovalenko (1968, 1989) and K6nig et al. (1967). Until the appearance of Kerstan et al. (1974), revised and translated as Matthes et al. (1978), there was no detailed systematic presentation of the theory of point processes. These books provided rigorous foundations for the theory of point processes on complete separable metric spaces, a setting which was seen to encompass Euclidean spaces and to be sufficiently general to fulfil demands from within the theory. They dealt also with the notion of a marked point process, that is a point process in which a supplementary variable or mark is associated with each point in the underlying process, and focused their account on theory for infinitely divisible processes. The generality and careful attention to detail resulted in a valuable reference, but one which makes heavy demands on the mathematical equipment of readers, especially in the areas of measure theory and integration, but also in topology and functional analysis. Similar comments could be made about Kallenberg (1983) and earlier editions (lst edn, 1975), although the terseness of this elegant work makes added demands on the reader. Daley and Vere-Jones (1988), attempting to be accessible to a wide range of readers, provides an excellent and comprehensive introduction to the theory. A still useful summary is available in the precursor, Daley and Vere-Jones (1972); some complementary aspects are discussed in Brillinger (1975). Srinivasan (1974) gives a brief heuristic account of some aspects of point process theory, emphasizing product densities (drawing on Moyal (1962)) and considering some applications. Another informal and practically motivated approach to the key ideas is provided in the short monograph by Cox and Isham (1980). Earlier, Cox and Lewis (1966) gave an account of statistical analysis of point process data together with some related theory. Stoyan et al. (1995) surveys a variety of aspects of point process theory with a view to applications in stochastic geometry. The various chapters in BarndorffNielsen et al. (1999) provide a rich vista of theory and applications of point processes, also emphasizing connections with stochastic geometry. The present article is a review of the theory of point processes and some related stochastic processes, with a choice of topics that is undoubtedly somewhat personal. The early sections attempt to provide a relatively non-technical summary of foundational ideas, with any technical details able to be passed over at first reading. Later sections indicate some of the rich variety of point process and other stochastic process models that can be built from more basic point processes, and consider aspects of statistical inference for point process data. A unifying tool throughout much of the presentation is provided by the probability generating functional, which stands in relation to a point process as does the probability
Point processes and some related processes
601
generating function to a nonnegative integer-valued random variable. The overall intention of this review is to whet the appetite: to introduce the reader to key ideas, preparing the way to the more specialized accounts in the many available books, and offering a window through which to view the wider literature. Mathematical generality has not been pursued for its own sake, and the early sections attempt an overview of the foundations that acknowledges but does not depend heavily on detailed understanding of measure theory. For this reason Poisson processes are not initially introduced in the most general way (with a specified mean measure), and product densities are discussed before the introduction of moment measures. Nevertheless, the reader who wishes to pursue some of the ideas of measure and integration theory will find a compact but useful summary in Section 1.8 of Stoyan et al. (1995), and a more detailed survey in Appendix 1 of Daley and Vere-Jones (1988). No attempt has been made to provide detailed references to all ideas discussed, but it is hoped that sufficient have been given for it to be clear where the reader might look initially for further information.
2. Foundations
2.1. Representation o f realizations
The simplest view of a point process realization is, as suggested above, a (finite or) countable set x = {xi} of (distinct) points randomly located in some state space S. Often this space would be one-dimensional, commonly representing time. More generally, in many applications the state space S will be a Euclidean space. For example, to deal with earthquakes would require, in principle, a five-dimensional space: a time dimension, three spatial dimensions for the locations of the epicentres, and a further dimension for the magnitudes. However, attention might be focused, at least initially, on just the epicentres. Spaces of which less structure is assumed can be used [cf. Kingman (1993, Chapter 2)], and are in fact required to deal with particular aspects of point process theory and some applications, for example in stochastic geometry and stereology [(cf. Stoyan et al. (1995)]; some such applications are mentioned in Section 4.5. For many purposes the assumption that the state space is a complete separable metric space is adequate, and this is assumed throughout the present review. In such a space there is a notion of distance between points and furthermore a countable dense subset, ideas which generalize two key properties of Euclidean spaces. [For some discussion about the choice of type of state space see Daley and Vere-Jones (1988, pp. 152-153).] The more specific assumption of a Euclidean state space is made in some places, for example in dealing with stationarity and in defining a homogeneous Poisson process, though the essence here is really just the presence of a group structure and an associated invariant measure. There are two other possible approaches to the representation of point process realizations. The first, limited to situations where the state space is one-dimen-
602
R. K. Milne
sional and ordered, involves representing the realization as the sequence of 'intervals' between successive points and also specifying the location of this configuration of points in relation to the origin. Renewal processes are usually considered in this way. In the second approach, the point process realization is viewed as a counting measure N(.), that is as a measure defined on a suitable class of sets and taking values which are nonnegative integers or +oo. (A measure, in the sense of the theory of measure and integration, is a set function which is nonnegative, zero for the empty set, and countably additive for sequences of pairwise disjoint sets. A probability measure is a more familiar special case where the measure is constrained to take the value one for the entire space on which it is defined; for a general measure the value taken on the whole space need not be one or even a finite value.) The connection between this and the initial view is that N(B) = ({xi C B}, the number of points from the set {xi} which lie in B, considered for suitable subsets B of the space. For technical reasons, the collection of such subsets would usually be assumed to be a o--field, i.e. to include S and be closed under complementation and countable unions and intersections. Commonly it would include at least all possible bounded sets which are intervals (on a line), rectangles or discs (in a plane), or boxes or balls (in space); the resultant collection of sets is called the (a-field of) Borel sets of S and denoted by ~s. In addition, it is usual to assume localfiniteness: that N(B) is finite whenever the set B is bounded. This assumption is likely to be satisfied in most applications; for an exception, where the origin is a limit point of every realization, see Section 9.4 of Kingman (1993). The counting measure representation of realizations, although for many not the most intuitive, does have practical relevance in that an appropriate summary description of the data may often be provided by counts in some selected sets. For developing a nice mathematical theory of point processes in general spaces the counting measure view is compelling, in particular because it facilitates the treatment of possible 'multiple' points. (In the set view of realizations the points xi must, strictly speaking, be distinct, although an extension is possible where with each point xi is associated a mark ki giving the multiplicity of that point; see Section 4.2.) A disadvantage of adopting the counting measure viewpoint is that the resultant theory depends on the theory of measure and integration, which is often regarded as difficult, and is not common knowledge among many who are interested in applications of point processes. In spite of this, many of the ideas of point process theory can be explained with minimal dependence on the technical details of measure and integration theory, sometimes by resorting to the more heuristic 'set' view of realizations. As far as possible, this is the approach adopted in the present article.
2.2. Probabilities and finite-dimensional distributions

In principle, to describe a point process there needs to be a specification of the joint distributions of counts in all possible finite sequences of Borel sets. For a fixed positive integer k, and (Borel) subsets B I , . . . , Bk of the state space we need
Point processes and some relatedprocesses

{Pr(N(B1) = n l , . . . , N ( B k ) =- nk): n l , . . . , n k = {0, 1,...}} .
603
This is simply the joint distribution of N ( B l ) , . . . , N(Bk), the numbers of points in the sets B1,...,Bk. These joint distributions, for all possible k and subsets B1,... ,Bk, are called the finite-dimensional (fidi) distributions of the point process. For a particular point process, they must be mutually consistent; for instance, the one-dimensional marginals of any fidi distribution for specified sets B1,...,Bk must coincide with the separate one-dimensional distributions for those sets, as must the two-dimensional marginals etc. Moreover, for example, since for disjoint B1,B2 the counts must be additive in the sense that N(B10B2) = N(B1)+N(B2) (with probability one), the family of one-dimensional distributions of a point process must be an 'additive' family of probability distributions. Although they are simple to describe in this way, such additivity and consistency conditions put severe limitations on the structure of the possible fidi distributions for a point process; in general it is a nontrivial task to specify a (new) point process model. Formally, a (random) point process can be defined as a measurable mapping from some probability space (~2, ~-, P) into the measurable space (M, J/{), where M is the space of all counting measures defined on the state space S, and Jg is the smallest a-field containing all cylinder sets, that is subsets of M of the form {N(.) E M : N(B1) = n l , . . . ,N(B~) = nk} for all possible positive integers k and Borel subsets B 1 , . . . , Bk of S. Alternatively, a point process can be defined directly as a probability measure on (M, J/l). The former point of view is to be preferred mathematically if it is desired, for example, to consider N(B) formally as a random variable. In any case, a point process can be described as a random counting measure, in the same sense that a random variable can be described as a random real number or a random m-vector as a random point in m. In general, the existence of a suitable probability measure on d/l is guaranteed by a refinement of the stochastic process existence theorem usually attributed to Kolmogorov provided a suitably consistent, additive family of potential fidi distributions can be specified. Readable early versions of the existence theorem for point processes were given by Nawrotzki (1962) and by Harris (1963) in his book on branching processes. Both versions restrict to S = ~; in the case of Harris (1963) there is a further restriction to processes for which N(R) < ec almost surely. For complete separable metric state spaces the relevant existence result is stated and proved as Theorem 1.3.5 in Matthes et al. (1978); in Daley and Vere-Jones (1988) the corresponding result is Theorem 7.1.XI. Although, in the manner indicated above, a point process can be determined by specifying its fidi distributions for all possible positive integers k and Borel subsets B1 . . . . ,Bk of S, it is enough to use a smaller class of sets than the Borel sets. For example, when S = N it is adequate to work with fidi distributions for sets B1,B2,... that are pairwise disjoint half-open intervals with rational end-points. The essential point, which can be extended to the case where S is a complete separable metric space, is that finite unions of such sets form a semi-ring (i.e. a collection closed under finite unions and differences) whose generated
604
R. K. Milne
a-field (i.e. smallest a-field containing the initial class) is -~s, the a-field of Borel subsets of S where ~ s is defined as the a-field generated by the open sets in S. As Neveu (1977) has commented, the existence theorem 'does not seem to have a great importance, most interesting point processes being constructed by special methods'. In studying the crossings of a fixed level by a stationary stochastic process [cf. Cram6r and Leadbetter (1967)], the required probability measure is determined from that of a previously specified stochastic process. For Poisson processes there is a relatively simple construction [cf. Kingman (1993, Section 2.5)] which is related to the 'conditional' property (see Sections 2.4 and 2.8) and which can be used also as a means of simulating such a process. Many other processes can be obtained by a further random procedure, such as compounding or clustering (see Section 3.1), starting from some more basic point process, often a Poisson process. Provided that attention is restricted to point processes on some bounded subset W of a state space S, a wide class of point processes can be defined by specifying their (Radon-Nikodym) densities, that is likelihood ratios, with respect to a homogeneous Poisson process of unit intensity defined on S. The specification through likelihood ratios with respect to a Poisson process facilitates simulation of such processes and also statistical inference for them. Section 4.6 discusses processes of this type. A point process is called simple if, with probability one, its realizations have no multiple points, i.e. if Pr(N({x}) = 0 or 1 for every x E S) = 1. A theorem due to M6nch (1971) [cf. Matthes et al. (1978, Theorem 1.4.9)] shows that the distribution of any simple point process is determined by its one-dimensional distributions for a suitably large subclass of Borel sets, specifically for the sets of a semi-ring generating the Borel a-field (e.g. on the real line, all sets which are finite unions of half-open intervals). M6nch's theorem is a generalization of a result which in the Poisson case is due to R~nyi (1967); see Section 2.8. In fact, the distribution of any simple point process is determined by the probabilities of zero counts for the sets of a semi-ring generating the Borel a-field; see, for example, Matthes et al. (1978, Proposition 1.4.7). In the latter form the result is related to the characterization of a random set by its avoidance function [cf. Daley and Vere-Jones (1988, Section 7.3), Matheron (1975, Section 2.2), and Matthes et al. (1978, Theorem 1.4.10)].
2.3. Stationarity and related properties
In this section it is assumed that the state space is Euclidean, i.e. S - d for some positive integer d. A point process is called (strictly) stationary if its probabilistic properties are invariant under translation: specifically, if for every u in the state space S, every positive integer k, every collection of Borel subsets B 1 , . . . , Bk of the state space, and every collection n l , . . . , nk of nonnegative integers, we have Pr(N(B1) = n l , . . . , N ( B k ) = nk) = Pr(N(B1 + u) = n l , . . . , N ( B k + u) = nk) ,
605
where B + u = {x + u : x C B} denotes the translate of B by u. For a stationary point process it follows that the distribution, and hence expected value, of the number of points falling in any translate of a given Borel set B is the same as that for the set B. A further consequence is that ~V(B) = #]B], where [B[ denotes the Lebesgue measure (length, area, volume etc.) of B and #, called the intensity, denotes the expected number of points falling in any set U having [U[ = 1. The intensity of a stationary point process is thus always a nonnegative real number, or possibly +oo. Also, for any stationary point process it can be shown that the limit p = lim Pr(N(Bh) > 0)
h--+0+
hd
(1)
where Bh = (0, h]d, exists and 0 _ 0) < EN(Bh) for all h, it follows that p _< #. From any point process it is possible to derive an associated simple point process by 'forgetting' all multiplicities in the original process. This derived process is stationary whenever the original process is stationary, and its intensity #* can be shown to satisfy #* = p, where p is the rate of the original process and the common value +oc is permissible. This latter result is widely known in the literature as Korolyuk's theorem. Extending these ideas it is possible to prove [cf. Milne (1971)] the existence for any stationary point process of a batch-size distribution {~k : k = 0, 1 . .}, and show that # = ~k=0 k~k. When the point process is simple, 7c1 1 and so # = p. The distribution {zck : k = 0, 1,...} can be viewed as the distribution of the number of occurrences at an arbitrary point, x, given that there is at least one occurrence there. A generalization of this idea is to look at the joint distribution of numbers of occurrences in some configuration of sets considered relative to an arbitrary point, x, given that there is an occurrence at x, and leads ultimately to the Palm distribution of a stationary point process. This can be defined as the conditional distribution of the process given that, for example, there is a point of the process at x. (Because of stationarity, this conditioning can be reduced to the demand that there be a point at the origin.) Such conditioning requires care because the conditioning event is clearly one of probability zero. It is by means of the Palm distribution that we must approach, for example, the distribution of the times between successive points (intervals') of a stationary point process on the real line when starting from a description of the process in terms of its fidi distributions. An important inversion formula allows the latter distributions to be expressed in terms of the Palm distribution. For further details concerning Palm distributions see Thorisson (1995), Sigman (1995, Chapter 4) or Stoyan et al. (1995, Section 4.4). For spatial point processes (d >_ 2) one may also be interested in isotropy, a concept analogous to stationarity, defined by all the fidi distributions being invariant under rotations. For a stationary isotropic point process the Palm distribution is needed at least in order to define formally the nearest-neighbour distribution function (the extension of the distribution function of a typical
=
606
R. K. Milne
interval between successive points for a stationary point process on the real line) and also the K-function. These concepts will be introduced in the discussion of statistical inference for such processes in Section 5.3, where it will be seen that the nearest-neighbour distribution function and the K-function each play an important role. A point process is termed mixing essentially if events defined in terms of the numbers of points in widely separated sets are close to being independent. [For a formal definition see, for example, Daley and Vere-Jones (1988, Section 10.3).] Some such property is needed to ensure consistent estimation of the intensity of a stationary point process from a single realization; many use the related notion of ergodicity [cf. Stoyan and Stoyan (1994, p. 194) or Karr (1991, Section 1.8 and Chapter 9)]. 2.4. Poisson, Bernoulli and renewal processes A homogeneous Poisson process in a Euclidean space S = ~d can be defined by the following two requirements: (P1) for every bounded Borel subset B of the state space, N ( B ) ~ Poi(2]BI) (i.e. N(B) is Poisson distributed with parameter 21BI), where I" I denotes Lebesgue measure; and (P2) for every positive integer k and all sequences B1,..., Bk of pairwise disjoint bounded Borel sets, N(B1),...,N(Bk) are mutually independent random variables. Any point process satisfying property (P2) is commonly called completely random. Together, (P1) and (P2) specify the form of all fidi distributions and ensure the resulting process is stationary. Moreover, (P1) clearly ensures that 2 is the intensity of the process. It is straightforward to show that such a process is simple, and hence that 2 is also the rate of the process. A fundamental property of any homogeneous Poisson process is that, given the number of points in a bounded subset of the state space, these points are distributed independently and uniformly over the subset. This 'conditional' property is an important tool in proving other results about homogeneous Poisson processes and about processes that are defined using them. It is also the key to simulation of such processes within a bounded set. The point process that results from such conditioning of a homogeneous Poisson process is a particular type of Bernoulli process [cf. Kingman (1993, Section 2.4)]. In general, such a process is defined on a compact (i.e. closed and bounded) set W by the demand that all its realizations have the same fixed (total) number of points and that these points be distributed independently and identically over W according to some specified probability distribution. As a consequence, these processes are easy to simulate (see Sections 4.1 and 4.6). Bernoulli processes are also called sample processes [cf. Kallenberg (1983, p. 15)] and binomial (point) processes [cf. Stoyan et al. (1995, Section 2.2)]. Such processes are, in a sense, more basic than Poisson processes.
Pointprocessesand some relatedprocesses
607
Suppose now that, in the above definition of a Poisson process, it is agreed to relax the implicit stationarity requirement and allow the intensity to vary over the state space, with its value at any u E S being given by 2(u) which is assumed nonnegative. In addition, it would usually now be assumed that J'B ,~(u)du is finite for all bounded Borel sets B. If (P1) in the definition of a homogeneous Poisson process is replaced by (P1) r for every bounded Borel subset B of the state space,
then the resultant process is called an inhomogeneous or nonhomogeneous Poisson process (a terminology not meant to exclude homogeneous Poisson processes as the special cases for which the intensity function is in fact constant), or simply a Poisson process with intensity function 2(.). The class of inhomogeneous Poisson processes includes, for example, processes for which 2(u), or what may be preferable In )~(u), exhibits some trend with u, is a periodic function, or is dependent on the values of some associated covariates at specified points u; see, for example, Cox (1972), Cressie (1991, pp. 654-657), or Lewis (1972b, Section 5). The classes of Poisson processes discussed here will be further extended in Section 2.8. An (ordinary) renewal process on S = [0, oc) can be defined by the set {L1,L1 +L2,L1 +L2 - - L 3 , . . . } of random points, where L1,L2,L3,... are independent and identically distributed 'lifetime' (i.e. nonnegative) random variables. When the first lifetime, L1, is allowed a distribution different from that of the other random variables the process is commonly called a modified renewalprocess. If the common distribution of L1, L2, L3,. is an exponential distribution, then the process reduces to a homogeneous Poisson process on [0, oc). Aside from this special case, counting properties of a renewal process are not simple to describe, though some results can be derived; see, for example, Cox (1962, Chapter 3). Various generalizations of renewal processes have been attempted in planar setting, i.e. with S = R2; for example, Hunter (1974a, b). Whilst such generalizations may have value for particular purposes, they were not intended to provide an approach to specification of general spatial point processes through 'interval' properties.
2.5. Some processes derived from Poisson processes

Many types of point process can be defined in terms of simpler point processes, for example Poisson or renewal processes. Whilst fidi distributions could then be derived, in practice such derivations may be difficult or tedious, and full details may not be needed. Probability generating functions or moment generating functions of relevant fidi distributions, or selected summary measures or moments, may provide useful tools. Often it is possible to determine a usable expression for a generating functional (Section 2.10) of the derived process. F r o m Poisson processes, three broad classes of point processes can readily be constructed by introducing further randomness. A compound Poisson process is
608
R. K. Milne
obtained from a Poisson process (homogeneous or inhomogeneous) if, independently of the other points, each point of the Poisson process is replaced by a random number of new points, with these numbers of new points identically distributed and every new point placed at its associated Poisson location. In general, this derived process may have points with multiplicity greater than one. A mixed Poisson process [cf. Grandell (1997)] is defined by allowing the parameter 2 of a homogeneous Poisson process to have a specified distribution. Such a process provides one of the simplest types of process which is not mixing, the lack of mixing being essentially a consequence of the dependence of the counts of points in even widely separated sets on the common value of 2. By allowing the intensity function of an inhomogeneous Poisson process to be a realization of some other stochastic process, a third class, that of doubly stochastic" Poisson processes or Cox processes, is obtained; see, for example, Grandell (1976, 1997). The 'driving' stochastic process, a random field if the state space is Lq 2, may represent an underlying (often unobservable) environmental heterogeneity. When the state space is the real line, taking the stochastic process governing the intensity function to be a continuous-time Markov chain with finitely many states yields a special type of Cox process known as a Markov modulated Poisson process; see, for example, Ryd6n (1995). The simplest case then arises from a Markov chain having just two states, with these corresponding to a high level and a low (or even zero) level for the intensity function. As is intuitively clear from the presence of further randomness beyond that of a Poisson process, the compound Poisson, mixed Poisson and Cox process models have the property of being overdispersed relative to a Poisson process, in that the variance of any count N(B) will usually be greater than the corresponding mean.
2.6. Product densities

An approach to point process properties that is both intuitively appealing and useful is through what are called the product densities of the process. These are defined for any simple point process in a Euclidean state space, subject to some further conditions that will be discussed in the next section. Product densities can be described in terms of differentials as follows: the first-order product density is given, for any u in the state space S, by ml(u)du = Pr(N(du) = 1). The above probabilities are expressions for the probability of the event 'a point at u'. A Poisson process with intensity function 2(.) has m~ (.) = 2(.). For a general point process, rnl (.) itself is called the intensity function of the process. When the state space is one-dimensional and interpreted as time, the term instantaneous intensity function is often used. Higher-order product densities, the 'coincidence densities' of Macchi (1975), are given analogously: for any positive integer k and any distinct Ul,..., uk in the state space, the kth order product density is the function mk(') given by
m k ( u l , . . . , u k ) d u l ' " d u k - - P r ( N ( d u l ) = l , . . . , N ( d u k ) = 1) ;
(2)
609
see also Daley and Vere-Jones (1988, Sections 5.4 and 5.7) and Srinivasan (1974, Chapter 2). For an inhomogeneous Poisson process with intensity function 2(.) the product densities are given by i n k ( u 1 , . . . , uk) = 2 ( u l ) . . . 2(u~) for all k and distinct u l , . . . , uk. For a stationary point process ml(u) =/~, the intensity of the point process, and m2(ul, u2) is a function just of u2 - ul. If the process is also isotropic m2(ul, us) is a function just of d(ul, u2), the Euclidean distance between ul and u2. In the case of a homogeneous Poisson process with intensity 2 the product densities reduce to i n k ( u 1 , . . . , uk) --- 2 ~ for all k and distinct Ul,... , U k . Moreover, for a mixed Poisson process the product densities are the respective moments (about the origin) of the (now random) intensity. The product densities of a renewal process on S = [0, oc) are
mk(ul , . . . , uk ) = hl (ul )h(u2 - Ul)..- h(uk
--
Uk-1)
for all k and distinct u l , . . . , uk, where h is the (ordinary) renewal density and hi the modified renewal density (which takes account of the fact that the origin need not be a renewal point). For a general point process described by product densities, the simplest connection with moments of counts is that ml (u)du = Pr(N(du) -- 1) = EN(du), or equivalently ~N(B) ----fB ml (u)du, and
EN(B)IN(B)-l]=f ms(ul,u2)duxdu2 ,
the latter quantity being the second factorial moment of N ( B ) . These ideas lead inevitably to the consideration of moment measures.
2.7. M o m e n t measures
Moment measures are, for a point process, the quantities which are the analogues of ordinary moments of a random variable. Such point process quantities are measures in the sense of the theory of measure and integration. Rather more knowledge of this theory than has been demanded in many earlier sections is needed to pursue a detailed study of moment measures, though even here some intuitive understanding can be developed. For any given point process, the set function MI(.) defined for B C ~s, the a-field of Borel subsets of S, by M1 (B) = ~JV(B), inherits the nonnegativity and additivity properties of counts associated with any realization and so is itself a measure, variously called the mean measure, the intensity measure, or the first m o m e n t measure of the process. Higher-order moment measures can be defined as follows. For k = 2, 3 , . . . define the kth m o m e n t measure, Mk(.), on 'rectangles' B~ x -- x B~ in S k, the Cartesian product of the state space with itself, by
Mk(B1 " " Bk) =[EN(B1)'"N(Bk)
(3)
Each object, M~(.), so defined, can be extended in a standard (measure-theoretic) way to a measure Mk(.) on the Borel sets of S k, and this extension is always a
610
R. K. Milne
symmetric measure since the order in which N(/31),..., N(B~) are multiplied does not matter. The simplest aspects of the dependence structure of a point process are embodied in its second moment measure 342 (.). The covariance measure can be defined in a similar manner, starting from C(B~ x/32) = M2(/3I /32) - M1 (13i)M1 (B2) for 'rectangles' B1 x B2 in S 2. Although it is additive for disjoint sets, observe that C(.) may take negative values and so is a signed measure. Substituting BI = B2 = / 3 gives M2(B x B) = 2{N(B)2}, and also the variance function, a set function, defined by var N(B) = C(B x B). When the state space is the real line and B = (0, t], it is the (point) function V ( t ) = varN((0, t]) which is usually termed the variance function. More generally, for any point process cumulant measures CI (.), C2(-), can be defined in terms of the moment measures by the usual moment-cumulant formulae; see, for example, Prohorov and Rozanov (1969, p. 347). In particular, Cl(-) = Ml(.) and C2(.) is the covariance measure C(.) as defined above. Moment (and cumulant) measures suffer the disadvantage that the second and all higher-order measures of any point process have 'diagonal concentrations' [Daley and Vere-Jones (1988, Section 5.4)], as a consequence of them being able to be viewed as first moment measures of 'product point processes' constructed from the original point process. This disadvantage can be side-stepped, at least for simple point processes, by using instead the factorial moment measures, which are the point process analogues of factorial moments for a random variable. The first factorial moment measure coincides with the mean measure, while the second factorial moment measure Mf2](.) can be defined by
MI2J(/31 /32) = M2(B
/32) -
(/31 n/32)
(4)
for Borel subsets B1 and B2 of S. Observe that when B1 = B2 = B (say) the above right-hand side reduces to the second factorial moment of N(B), and that when B1 and B2 are disjoint it reduces to M2(B1 x B2). Factorial cumulant measures can also be considered; for details see Daley and Vere-Jones (1988, Section 5.5). Each set of moment/cumulant measures can be linked in a natural way with a generating functional, as will be indicated for moment and factorial moment measures in Section 2.10. When S = ~d for some d, the factorial moment measures may be absolutely continuous with respect to Lebesgue measure (length, area, volume etc.) in the appropriate dimensional Euclidean space (e.g. Nd for the mean measure and ~2d for the second factorial moment measure), and so able to be defined by densities. These are then the densities which we introduced earlier in a heuristic manner under the name of product densities. An important and useful result about moment measures is Campbell's theorem [cf. Daley and Vere-Jones (1988, Section 6.4), and Kingman (1993, Section 3.2) in the Poisson case]. The simplest version of this result is that for any nonnegative measurable function, or any Ml-integrable function 9, and any Borel subset B of the state space
Pointprocessesand some relatedprocesses ~-{ fRg(u)N(du)} = f g(u)Ml(du) .

An alternative, possibly more intuitive expression of this is
611
(5)
where {X~} is the set of random points corresponding to the point process, ml (u) its intensity function (assumed to exist). When the point process is stationary with intensity it, the latter integral simplifies to # r e g(u)du. Such results are automatically true whenever g is the indicator function of some Borel subset of S, because they are then a consequence of the definition of the first moment measure. The extension to finite linear combinations of indicator functions with nonnegative coefficients (simple functions) is immediate, whilst the extension to the stated classes of functions can be achieved by the usual extension methods of measure and integration theory (i.e. approximation by increasing sequences of simple functions etc.). The above simple version of Campbell's theorem can be extended to the following result about higher-order moment measures: for any nonnegative measurable function, or any Mk-integrable function 9, and any Borel subset A of S k,
E { f g(ul,...,Uk)N(dul)'"N(du~)} = JA g(ul,..., uk)Mk(dul'" duk) .

(6)
There are also useful refinements involving the Palm distribution; see, for example, Stoyan et al. (1995, Section 4.4).
2.8. General Poisson processes

Suppose that #(-) is a measure on (S, ~s), where S is not necessarily a Euclidean space and #(B) is finite for bounded sets B. Replace (P1) in the definition of a homogeneous Poisson process (given in Section 2.4) by (P1)" for every bounded Borel subset B of the state space,
N(B) ~ Poi(/~(B)) .
The existence of a point process satisfying (P1)" and the complete randomness property (P2) is a consequence of the refined Kolmogorov theorem, since these two requirements specify a suitable form for all fidi distributions. It is immediate that this point process has mean measure g(.). Such a process is usually referred to as a Poisson process with mean measure /~(.). Reflecting the property (P2), cov(N(BI),N(B2)) =var(N(Bl AB2))=#(B1 NB2), and hence the covariance measure C(.) of such a process is determined by C(B1 x B2) = I~(B1 N B2) for all Borel sets B1 and B2 of S. This shows that the covariance measure of a Poisson
612
R. K. Milne
process is concentrated on the leading diagonal in S x S, and that the second factorial moment measure is identically zero. One aspect of the generality of this definition is indicated by the observation that with the choice of a single point state space, for example S = { 1}, the resultant point process corresponds to a single Poisson random variable, whilst when the state space has m points, for example S = { 1 , 2 , . . . , m}, the point process corresponds to a r a n d o m m-vector with independent Poisson distributed components. Another aspect is that when S = Rd, homogeneous and inhomogeneous Poisson processes as introduced earlier are special cases with respectively #(B) = 21B I and #(B) = f~ 2(u)du for Borel sets B. In principle, with the above definition of a Poisson process the mean measure could have a discrete component, with part, or even all, of its mass concentrated on a set of points (atoms), of which there can be at most countably m a n y and only a finite number in any bounded set B. F o r any x with itx =/~({x}) > 0, it follows that Pr(N({x}) > 0) > 0; thus such an x, often termed a f i x e d atom, is a possible multiple point of the process. (In fact, N({x}) ~ Poi(/lx). ) A Poisson process with mean measure #(.) on S = R is simple if and only if its mean measure has no atoms. In general the mean measure could have part of its mass concentrated on some lower-dimensional subspace, for example a line, and so in general a Poisson process could have a positive probability of points lying in such a subspace. None of these possibilities can occur for a homogeneous Poisson process, or even an inhomogeneous Poisson process as defined in Section 2.4. Some, though not all, authors restrict attention to Poisson processes whose mean measure has no discrete component, and thereby implicitly to Poisson processes which are simple; see, for example, Kingman (1993) where this approach is motivated by a view of a Poisson process as a random set (which by definition is not allowed repeated points). Any Poisson process with mean measure #(.) has the following property: for any bounded set B and any nonnegative integers h i , . . . , nk Pr(N(BI) = n l , . . . , N ( e k ) = nklN(B) = n)
rLk_~ ~. ~=~ [~(B)J
'
(7)
whenever n = nl + - . . + nk. Thus, given N(B) = n, these n points can be considered to be independently distributed over B according to the probability measure #e(.)/l~(B) where/~e(') = #(" riB) denotes the restriction of #(.) to B. For a stationary Poisson process on Nd this measure reduces to the uniform distribution over B, and the property is exactly the 'conditional' property discussed earlier (in Section 2.4) for such processes. Kingman (1993) is an extended and erudite essay which presents the m a n y beautiful properties and applications of Poisson and related processes, and shows how such processes can be defined in spaces having minimal structure, In particular, Section 3.4 of Kingman's book has an interesting discussion of a
613
fundamental result due to R6nyi (1967). In one form this result shows that, although the defining assumptions (P1) 11 and (P2) are not incompatible (and are conveniently taken as the c o m m o n definition), in fact (P2) is redundant in the presence of (P 1)". Moreover, the latter assumption can be considerably weakened: provided the mean measure is finite on bounded sets and has no discrete component, it is enough to demand the Poisson form for the probabilities of zero counts for a suitably large subclass of Borel sets, specifically for the sets of a semiring generating the Borel a-field (e.g. on the real line, all sets which are finite unions of half-open intervals). (See also the earlier discussion, in Section 2.2, of M6nch's theorem.)
2.9. R a n d o m measures
In studying some aspects of point processes one is soon led, for one reason or another, to consider r a n d o m measures. F r o m a mathematical point of view, as has been discussed in Section 2.1, it is natural to consider a point process realization as a counting measure, and it then seems equally natural to consider stochastic processes whose realizations are measures. R a n d o m measures arise naturally also from a modelling view-point: as models for the distribution o f mass, for example of some mineral, over an area or in space; in dealing with a generalization of Cox processes as discussed in Section 2.5 here, in order to take account of possible environmental heterogeneity, we allow the mean measure of a Poisson process to be r a n d o m [cf. Stoyan et al. (1995, Section 5.2)]; and as mark-sum processes built from an underlying marked point process (see Section 4.2) by means of ~(B) = ~i:xicB ki, where the realization of the marked point process is denoted by {(xi, ki)}. Some further reasons are based on connections with other stochastic processes: for a line segment process, consider ~ ( B ) = total length of line segments intersecting B; for a stochastic process, consider ~(B) = length/area etc. of the intersection with B of exceedance regions for a given level; and for a Boolean model [cf. Stoyan et al. (1995, Chapter 3)] built from discs centred at the points of a planar Poisson process, consider ~(B) = area of the intersection with B of the random set defined by the union of the discs, or ~(B) -- perimeter of the intersection with B of the boundary of that r a n d o m set. For some other r a n d o m measures that arise naturally in the study of stochastic geometry see Section 7.3.4 of Stoyan et al. (1995). In thinking about a realization of a random measure as compared with that of a point process, there arise considerations similar to those which arise when contemplating a general measure (or probability distribution) as compared with
614
R. K. M i l n e
one that is discrete; a realization of a random measure may 'smear' the mass over a region and/or it may place mass at discrete set of points (atoms). The formal definition of a random measure can proceed in a manner similar to the point process case. In the present context, take the space of realizations to be M = the set of all measures 4(') on (S, Ns) that are finite on bounded sets and let J / = the smallest o--field, of subsets of M containing all sets of the form {4(') E M : ~(B1) _< U l , . . . , ~ ( B k ) < u k } for all possible positive integers k, all Borel sets B 1 , . , B k of S and all nonnegative real numbers Ul,..., uk. Then, corresponding to the two points of view indicated for point processes in Section 2.2, a r a n d o m m e a s u r e can be defined either as a probability measure on the space (M, d//), or as a measurable mapping from some probability space (f2, ~ , P) into (M, ~ ) . The finite-dimensional (fidi) distributions of a random measure ~ are the (joint) distributions, on k, of ~(B1),..., ~(Bk) where B 1 , . . . , B~ are bounded Borel sets of S and k is a positive integer. Since they will not, in general, be discrete distributions (as in the point process case) or distributions absolutely continuous with respect to the appropriate Lebesgue measure (and so able to be described by density functions), such fidi distributions need to be described by their distribution functions
Fk(Bl, . . . ,Bk ; ul , . . . , uk ) Ul,...,uk > O .
=Pr((Bl)_<ul,...,~(Bk)<_uk),
There is a generalization to random measures of the refined Kolmogorov theorem described in the point process case. For the fidi distributions we require a consistent family of probability measures on the nonnegative orthant of Nk for positive integers k (by contrast with {0, 1,...}k for positive integers k in the point process case). In addition, as in the point process case, conditions are required in order to ensure 'additivity' and the 'measure character' of realizations. Such results are provided, for example, in Daley and Vere-Jones (1988, Theorem 6.2.VII); see also Kallenberg (1983, Section 5.2). Moment measures can be introduced for random measures in much the same way as they were for point processes. In particular, the first moment measure, mean measure or intensity measure is given by M1 (B) = E~(B), B E ~s. A simple, though rather trivial example of a random measure, can be specified as follows. Suppose S = Rd and that I " I denotes Lebesgue measure. Let X be a nonnegative random variable (e.g. gamma distributed) and set ~(B) = XIBI for any Borel set B of S. A less trivial example is built, as is the Poisson process, by using complete randomness after specifying a suitable family of one-dimensional distributions. Suppose that fi is a positive real number and #(.) a measure on (S, ~ s ) which is finite for bounded sets B. A g a m m a r a n d o m m e a s u r e can be defined by the requirements: (G1) for every bounded Borel subset B of the state space, ~(B) ~ Gamma(g(B), l i f t ) , where the latter is determined by the distribution function
615
F(l~
flU(B) t~(B)-le-t/~dt'
u > 0 ,
or equivalently by the Laplace transform E{e -t~(~) } = [1 +/~t]-~(B); and (G2) for every positive integer k and all sequences B 1 , . . . , Bk of pairwise disjoint bounded Borel sets, ~(B~),...,~(B~) are mutually independent random variables. The complete randomness property (G2) ensures consistency, while it and the usual properties of gamma distributions with fixed scale parameter ensure the required additivity. Strictly, there is a need to check also a simple continuity condition [cf. Condition 6.2.4 of Daley and Vere-Jones (1988)], namely that for any non-increasing sequence {Bn } of bounded sets from ~ s that converges to the empty set 0 as n tends to infinity, the corresponding G a m m a (#(Bn), 1/fi) distributions converge to the degenerate distribution concentrated at the origin. Then, by appeal to a random measure version of the refined Kolmogorov theorem the existence of a random measure ~ which is completely random and has the above gamma distributions as its one-dimensional distributions can be assured. The mean measure of this gamma random measure is the measure/~(.). Such random measures were considered by Sewastjanov (1975, Chapter XII) in applications to branching processes. Because we are here dealing with a random measure, it may come as some surprise that any realization of such a gamma random measure can be shown to be a purely atomic measure. This result is well known from the theory of stochastic processes with independent increments [see, for example, Gikhman and Skorokhod (1969, Chapter VI) in the case S = R]. Gamma random measures with/~(S) < oc were used by Ferguson (1973) in a study of prior distributions on spaces of probability measures, and by Shorrock (1975) in a paper about discrete time extremal processes. Whilst there are few simple examples of random measures, essentially because there is a dearth of additive families of distributions, many general results are known. For example, it is possible to characterize those random measures which are completely random, i.e. random measures satisfying just (G2); for an elegant treatment see Kingman (1993, Chapter 8). Kallenberg (1983) provides a systematic study of random measures, including completely random measures and the wider class of infinitely divisible random measures. Extensions of the theory of random measures to random signed measures are not routine. There is some discussion and a relevant example in Section 6.1 of Daley and Vere-Jones (1988); see also Section 7.1.1 of Stoyan et al. (1995), where it is pointed out that some theory exists and that this is important for studying curvatures of random closed sets.
2.10. Generating functionals
In the study of random variables and their distributions, various types of generating function (probability generating functions, moment generating functions or Laplace transforms, and characteristic functions) have proved to be useful
616
R. K. Milne
tools. Moment generating functions or Laplace transforms are well-suited to dealing with nonnegative random variables, while probability generating functions play a special role with nonnegative integer-valued random variables. All such generating functions have 'functional' analogues that are useful in the study of point processes. (Here the argument of a functional is a real-valued function, rather than a real number or a vector of real numbers.) These functionals provide a means of compactly summarizing information about point processes and enabling that information to be easily manipulated. The probability generating functional (pgfl) G of a point process N can be defined, for suitable functions h with domain S (and suitable conventions about the logarithm function when its argument is zero), by G[h] = E(exp{fs lnh(u)N(du)}). The simplest class of such functions h consists of those (measurable functions) taking values in the interval [0, 11 and equal to one outside some bounded subset of S. Heuristically
G[h] =[E(Hh(u)N(du)l : E(
ku~S /
h(IA)N({u})) "
(8)
u:N({,})>0
In the middle expression the integrand is a product integral, and shows that G can be viewed as the joint probability generating function of all the random variables N ( d u ) , u c S; notice that at most countably many of these will be non-zero. Observe that for pairwise disjoint B],...,Bk and h(x) zi x E Bi, i = 1 , . . . , k
I1
x s\U lBi,
. . . zk ), which is the joint pgfn of N ( B 1 ) , . . . , N ( B k ) . Working heuristically using (8), the defining properties of an inhomogeneous Poisson process with intensity function 2(-) (which, in particular, ensure that for u v N(du) and N(dv) are independent), and the form of the probability generating function of a Poisson distributed random variable, it follows that the pgfl of an inhomogeneous Poisson process with intensity function 2(.) is G[h]= I - [ e x p { [ h ( u ) - l ] 2 ( u ) d u } = exp{ fs [h(u)- 1]2(u)du} (9)
G[h] .
uES
where the middle expression is again a product integral. Observe that (9) generalizes to G[h] = exp{fs[h(u ) - 1]#(du)} for a Poisson process with mean measure #(-), and simplifies to G[h] = exp{2 fs[h(u) - 1]du} in the case of a homogeneous Poisson process with intensity 2. The Laplace functional (Lfl) of a random measure is defined by
(lo)
617
where again the integrand in the middle expression is a product integral, for nonnegative (measurable) functions h defined on S that vanish outside some bounded subset of S. The requirements on the functions h ensure that fs h (u)~ (du) is well-defined and finite almost surely. Observe that for pairwise disjoint B1,. . , B ~ and h given by h(u) = F_~i=ltilBi(u) k the Lfl of a random measure reduces to
L[h] =
E exp i=1
ti~(Bi)
})
(11)
that is to the Laplace transform of the joint distribution of ~(B1),..., ~(B~). Since any point process is a random measure, the Lfl can be used also for point processes. Working heuristically, as above for the pgfl of an inhomogeneous Poisson process, using (10), the Laplace transform of a gamma density and the defining properties of the gamma random measure with mean measure/~(.), it follows that the Lfl of this random measure is
L[h] = H ~_(exp{-h(u)#(du)})
uES
= I I e x p { - ln[1 +
uES
fih(u)]~(du)} ,
where the latter is a product integral expression which yields
L[h] = e x p { - fs ln[l + [3h(u)]#(du) } .
(12)
For the reader desirous of a more formal approach to the derivation of either (9) or (12) it is straightforward to write down the relevant joint transform corresponding to any pairwise disjoint bounded Borel sets B1,...,Bk and from this obtain the associated functional by taking a limit for a suitable sequence {h,} of arguments. That such a method for deriving a generating functional does work is a consequence of an appropriate characterization result; see, for example, Daley and Vere-Jones (1988, Theorem 7.4.II) for the pgfl case and their Exercise 6.4.2 for the Lfl version. Also of importance for each type of functional is a uniqueness result, which allows the distribution of a point process or random measure to be deduced from the form of its generating functional; see Ji~ina (1964) for Lfls and Westcott (1972) for pgfls. Suppose that ~ is a random measure on (S, Ns) with mean measure ~t(.). Consider a Cox process driven by this random measure, i.e. consider a point process which, given 4, is a Poisson process on S with mean measure 4('). Then it can be shown that the pgft, G, of the Cox process is related to the Lfl L of the driving random measure by GIh] = L[1 - h i , for any (measurable) function h taking values in the interval [0, 1] and equal to one outside some bounded subset of S. This generating functional relationship can be used to obtain various properties. It can be deduced that the mean measure of such a Cox process coincides with the mean measure of the driving random measure and also that the Cox process will be completely random iff its driving random measure is corn-
618
R. K. Milne
pletely random. If ~ is the gamma random measure discussed in Section 2.9, then the pgfl of the Cox process is
G[h] = exp{- fs ln[l + fl(1- h(u))l #(du) } .
(13)
From this form it follows easily that the one-dimensional distributions of this Cox process are of negative binomial form with probability generating function E(z jr(B)) = [ l f i ( 1 - z ) ] -~(B), B E ~ s (14)
Generating functionals can be linked to the various types of moment/cumulant measures, for example by suitable differentiation or expansion, in ways which generalize the connections between moments/cumulants and the classical generating functions [cf. Westcott (1972), Stoyan et al. (1995, Sections 4.4 and 7.2) and Daley and Vere-Jones (1988, Section 7.4)]. For example, under conditions which ensure the convergence of the right-hand side, the pgfl can be expressed in terms of the factorial moment measures by G[1 - h] = 1 + ~ -" (-1)kk!is"" [
fsh(Ul)...h(uk)M[k](du, ...duk)
(15)
and a similar expansion links in G[1 - h] with the factorial cumulant measures. It is instructive to consider such expansions for a Poisson process. Similarly, for the Lfl there is an expansion in terms of the moment measures:
Lib] = 1 ~k=l ~ . "'" h(H1) " ' ' h ( u k ) M k ( d u l " ' "
duk)
(16)
A further type of functional, the characteristic functional [cf. Daley and Vere-Jones (1988, Section 6.4)] can be used for random measures, including point processes, in a manner similar to the Lfl. As the name suggests the characteristic functional is for random measures an analogue of the ordinary characteristic function.
3. Operations on point processes and associated limit results
3.1. Operations
New point processes can be generated in several ways from a process or processes already defined. Generating functionals provide a useful unifying device and operational tool.
Superposition. For two not necessarily independent point processes Nl and N2, a new process N, their superposition, can be defined by N = N1 + N2. This is to be interpreted as N(B) = N1(B) + N2(B) for Borel subsets B of the state space, or in intuitive terms as the pooling of the sets of points for the two processes (though
Point processesand some relatedprocesses
619
the problem with the latter view is that multiple points may be introduced by the pooling). For processes which are independent, the distribution of their superposition can be expressed as the convolution of the distributions of the summand processes. In particular, for example from (9), the superposition of two independent Poisson processes with respective mean measures #1(') and #2(') is again a Poisson process, with mean measure #1(') + #2('). The superposition of two independent homogeneous Poisson processes with respective intensities 21 and )~2 is another homogeneous Poisson process, with intensity 2 = 21 + ,~2. The superposition N of two independent point processes N1 and N2 having respective pgfls G1 and G2 has probability generating functional G[h] = G1 [h]G2[h]. This general result can be employed to verify the above assertions about the superposition of two independent Poisson processes. In a similar way, the superposition of finitely many point processes can be considered and, under appropriate conditions, the superposition of a countable number of point processes; the latter extension is needed, for example, in order to deal with cluster processes as discussed below.
Random deletion. Given a realization of a point process N, suppose that each point in the realization is deleted with probability 1 - p and retained with probability p, independently of all other points in the realization. This operation is variously referred to as (random) thinning, (random) deletion or sometimes Bernoulli deletion.
When ml (u) is the intensity function for the original process N, the intensity function for the process of retained points is clearly pro1 (u). (This is, for example, obvious from the product density interpretation.) Thus, if N is stationary with intensity # then the process of retained points is stationary, with intensity p#; further, if N is a homogeneous Poisson process then so is the new process. This result for a Poisson process can be seen, for example, from (9), as will be indicated shortly. A generalization is to allow position dependent thinning. For example, a point at x in a realization of the original point process could be deleted with probability 1 - p(x) and retained with probability p(x), independently of all other points in the realization. By first conditioning on the original process, the pgfl Go [hI of the output (or thinned) process can be expressed in terms of the pgfl GI [hi of the input (original) process as Go[hi= ~(x~es{p(x)h(x)+[1-p(x)]} N(dx)) = G~[1 +p(.)[h(.) - 11] . (17)
Now observe that such position dependent thinning of a homogeneous Poisson process with intensity 2 results in an output process with pgfl Go[h I = exp{,~ fs[h(x) - 1]p(x)dx}, and so, using (9), in an inhomogeneous Poisson process with intensity function 2p(x).
620
R. K. Milne
Random translation. This operation can be defined as follows for any point
process N. Given a realization of N, each point in that realization is shifted, independently of all other points in the realization, the shift having some specified distribution function on S, where this distribution is the same for each point in the original realization. (The shifts are thus assumed independent and identically distributed for each point in the realization of N.) Such a randomly translated process is stationary whenever the original point process is stationary. Whenever the original point process is Poisson the randomly translated process is Poisson. This result can again be seen by a pgfl argument. Let the translation distribution be denoted by F. First condition on the original process, assumed to be Poisson with intensity measure #(-), to give
G[h] = ~- l-[ { f h(x + t)F(dt)

\ x E S t. d S
=exp{./s[/sh(X+t)F(dt ) - 1]#(dx)} .
The last expression can be rearranged to give
(18)
G[h] = exp{ffs[h(u ) - e]#F(du)} ,
(19)
where #F(') is defined by #F(B) = fs #(B - t)F(dt) with B - t = {u - t: u E B}. (This is also easily seen using product densities.) Moreover, i f N is a homogeneous Poisson process then so is any random translation of it, and the intensities of the two processes are the same. A generalization, discussed for example in Daley and Vere-Jones (1988, Example 8.2(b)), allows the shift of each point x in the original point process to be governed by a Markov kernel H(.[x), where H(BJx) is the probability, given a point at x, of shifting that point into the set B. (Notice that now such a shift may be dependent on the position of the original point.) By allowing this kernel to be substochastic, i.e. H(S[x) < 1, position dependent thinning can be brought within this framework.
Cluster processes. Now suppose that each point of an 'input' point process is replaced by the points of some subsidiary point process or cluster, and that the superposition of all these clusters is then the 'output' process or cluster process. In the simplest type of cluster process the input is a homogeneous Poisson process of specified intensity and the clusters are independent and identically distributed point processes, each with its origin translated to the associated point, often called the cluster centre, in the input process. When S = ~ two general types of cluster structure have been considered:
(i) a finite renewal process where the number of points is either fixed or follows a specified distribution, and the interval (lifetime) distribution also specified; or
621
(ii) the number of points is either fixed or follows a specified distribution, and these points are placed independently and identically according to a specified distribution on S. The resultant cluster processes are respectively termed Bartlett-Lewis and Neyman-Scott (cluster) processes. Neyman-Scott processes can be considered also for Euclidean state spaces S = Na. In principle, a Bartlett-Lewis type process could also be considered in S = Ra, though this extension seems less common. Random translation, as introduced earlier, is a particular case of each of these types of cluster structure in which each cluster has a single point. A cluster structure in which all the points of a given cluster are placed at the cluster centre is often referred to as a compounding. A compound Poisson process, as introduced in Section 2.5, is a special case. Bernoulli deletion, as introduced earlier, is a particular case of compounding. The type of generating functional approach considered for the operations of random deletion and random translation can be generalized to give an expression for the pgfl of a cluster process. Suppose that the pgfl of the cluster arising from an input point at x is G[hlx I . Then the pgfl of the output or cluster process given input {xi} is Ili G[h[xil, and therefore the unconditional pgfl of the output process is Go[h]=E(u:N({u})>0H {G[h]u]}N({u})) = GI[G[h[']] (20)
where GI[.] denotes the pgfl of the input process. If the input process is a Poisson process with mean measure #(-) then the pgfl of the resultant output process is
Golh] = exp{ fs(GIh]x ] - 1)#(dx)} .
(21)
Such a process is called a Poisson cluster process. For a N e y m a ~ S c o t t process, with the number of points in a typical cluster having probability generating function Hc and the individual points in a cluster distributed about the cluster centre according to the distribution F, the pgfl G[h]x] of the cluster arising from an input point at x is G[hlx] = Hc(fsh(x +y)F(dy)). A question not addressed above is the important one of whether the output process exists, in the sense of there being almost surely a finite number of points in any bounded set. This is directly linked with the question of convergence of the infinite product at (20). For further discussion of these aspects see Daley and Vere-Jones (1988, Section 8.2).
State space tramformation. This operation is defined initially as a mapping which transforms points of the state space S into points of a new state space S*, and not directly on a process or processes. Such a mapping then transforms the set of points of any given point process realization in S into a corresponding realization in S*. Under state space transformation a Poisson process on S is transformed into another such process on S* [Kingman (1993, Section 2.3)]. If the initial
622
R. K. Milne
process were a homogeneous Poisson process then the transformed process would in general be an inhomogeneous Poisson process. The operations of superposition and state space transformation of point processes can be extended to random measures. The remaining operations do not in general have direct analogues.
3.2. Limit results
With each of the operations of superposition, random deletion and random translation can be associated certain limit results concerning point processes. Such results had their beginnings in a result, enunciated by Palm (1943) and with a proof completed by Khinchin (1955, 1969), that the superposition of a number of independent and suitably sparse point processes tends to a homogeneous Poisson process as the number of 'summand' processes tends to infinity. This result is considered important for its capacity to explain the widespread usefulness of Poisson processes in many applications. Generalization of this result leads to a limit theorem [cf. Daley and Vere-Jones (1988, Proposition 9.2.IV)] for superpositions of point processes in a triangular array and thereby to a characterization of infinitely divisible point processes (see Section 4.6). For point processes in Euclidean state spaces, the simplest limit theorem for random deletions can be stated loosely as follows: suppose that points of an initial point process are subject to Bernoulli deletions, with a retention probability o f p for any individual point, and that to compensate for this loss of points the scale is contracted so as to balance the deletions and preserve the intensity. It is then possible to prove convergence as p ~ 0, starting from suitable initial point processes, to a homogeneous Poisson process. There are various limit results for random translations. One of the simplest assumes the points of a suitable initial point process move with independent and identically distributed random velocities, and establishes convergence to a homogeneous Poisson process as time tends to infinity. Substantial generalizations of these basic results can be considered. Generating functional methods can be used in constructing proofs. No details will be given here, since to do so would require more serious discussion of convergence of point processes. For details of such convergence and various limit results see, for example, Chapter 9 of Daley and Vere-Jones (1988). The fundamental convergence concepts are developed also in Kallenberg (1983, Chapter 4) and in Matthes et al. (1978, Chapter 3).
4. Some other classes of point process

4.1. M i x e d Bernoulli processes
Assume initially that the state space S is Euclidean and that W is a not necessarily bounded subset of S. Consider a general Bernoulli process defined on W, as in Section 2.4, with the distribution for any one of its points defined by some
623
probability measure Q on W. A mixed Bernoulli process is obtained by allowing the total number of points in each realization to have some prescribed probability distribution. If the mixture distribution is taken as the Poisson distribution with parameter ~ then for any bounded Borel set B, N(B) ~ Poi(2Q(B)). Such a process is clearly a Poisson process with mean measure 2Q(.). However, since 2Q(W) < ~ such a process is constrained to have N ( W ) almost surely finite, i.e. its realizations have a finite total number of points with probability one. The importance of the above construction is that it provides a genuinely constructive approach to Poisson processes with (totally) finite mean measure, and yields a means of simulating Poisson processes on bounded sets. Thus, to simulate on a bounded set B a Poisson process having mean measure #(-): (i) choose a random integer n from a Poi(#(B)) distribution, and (ii) choose n points independently according to the probability distribution #~(.)/#(B), where #8(') = #(" riB) denotes the restriction of/z(.) to 8. In fact, this construction works in an arbitrary state space and, in particular, is not dependent on its dimension. A 'pasting' argument [cf. Williams (1979, pp. 46-47)] can be used to extend to the a-finite case: construct independent Poisson processes on B1,B2... where { 8 1 , 8 2 , . . . ) is a partition of S and #(Bi) < cxD, i = 1 , 2 , . . . , and then 'paste' these together. This is essentially the approach taken by, for example, Kingman (1993, Section 2.5) in his treatment of existence of Poisson processes with a specified (nonatomic) mean measure. To obtain a homogeneous Poisson process requires only that (ii) be specialized to choosing n points independently and uniformly on B. If the mixture distribution is taken as a negative binomial distribution then the counts in individual Borel sets have related negative binomial distributions whose probability parameters depend on the Borel sets being considered and a simple expression can be given for the pgfl. Such processes are reasonably termed negative binomial processes. The processes so generated can also be obtained as a mixed Poisson process where the mixing distribution is a gamma distribution; see, for example, Daley and Vere-Jones (1988, Section 7.4). For other types of negative binomial process see Diggle and Milne (1983a).
4.2. Marked point processes
As was indicated earlier, for modelling the location of earthquakes in a region, or trees in a forest, it is often appropriate to introduce a further random quantity, often called a mark, associated with each point in an underlying point process (of locations). The resultant process is known as a marked point process. Such a process can be considered as a point process on a product space S K consisting of all pairs (x, m) where x is from the state space S of locations and m from a space K consisting of all possible marks. For the two examples indicated above the mark space would be taken as K = (0, ~ ) , though other choices for K are possible. Any compound point process, in particular a compound Poisson process,
624
R. K. Milne
can be viewed as a marked point process; in the stationary case the mark distribution is what earlier we called the batch size distribution. More generally, any cluster point process can be viewed as a marked point process with the marks in this case being the subsidiary point processes associated with each cluster centre. The general theory of point processes (where the state space can be any complete separable metric space) does not, strictly speaking, require extension to cover marked point processes. Nevertheless, the product structure of the new state space leads to related special structure in, for example, moment measures and product densities, and it is often helpful to recognize this; see, for example, Stoyan et al. (1995, Chapter 4). In particular, there is a version of Campbell's theorem for marked point processes [see also Stoyan and Stoyan (1994, Section 14.2)]. For modelling situations where there are two types of point, for example two types of tree in a forest or two types of cell in a region, a point process with a two point mark space, e.g. K = {1,2}, can be used. A particular class of two-type point process is based on independent marking, where the marks are assigned independently to the points of some underlying point process of locations. This is formally the same as considering jointly the point process of 'retained' points and the point process of 'deleted' points for the case of Bernoulli deletions. It is then easy to write down an expression for the joint pgfl of the two processes (probability generating functional of the marked point process) in terms of the pgfl of the original process. In general, the joint pgfl G of two point processes N1 and N2 is given by
G[hl,h2]~-~_(Ilhl(u)Nl(dU)IIh2(v)Nz(dv)) , \uES yes /
(22)
for suitable functions h~ and h2 as in (8). By first conditioning on the input process, the pgfl G[hi,h2] of an independently marked point process process can be expressed in terms of the pgfl Gi[h] of the input (original) process as
Go[hi,h2] = ~-(x~Es{P(x)hl(X)+[1-P(x)]h2(x)} N(dx))

= Cq[p(')hl(.) + [1 -p(')]h2(')] (23) When the input process is a homogeneous Poisson process with intensity 2 the pgfl of the independently marked point process is
Go[h~,h2]=exp{2 fs[P(x)h,(x) + [l -p(x)lh2(x)- l]dx} ,

and this can be simplified to
(24)
Go[hi,h2] =
exp{2~s[h,(x ) - 1]p(x)dx} e x p ( 2 fs[h2(x ) - 1][1-p(x)]dx} . (25)
Pointprocessesandsomerelatedprocesses
625
Thus, each of the processes Nl and N2 is an inhomogeneous Poisson process and these processes are independent, this independence being an unexpected consequence. The independent marking, or splitting, property of a Poisson process is termed the colouring property by Kingman (1993, Section 5.1); later in the same chapter he explores significant generalizations of the property. If the mark space is a finite set, taken without loss of generality to be K = { 1 , 2 , . . . , s}, then the marked point process is often called a multitype or multivariate point process; see, for example, Cox and Isham (1980, Chapter 5) or Diggle (1983, Chapter 6). The next section considers two-type Poisson processes; Diggle and Milne (1983b) have explored some bivariate Cox processes as possible models for spatial patterns exhibiting dependence between the points of two types.
4.3. Two-type Poisson processes

Although the independent marking of a Poisson process, as considered above, leads to independent Poisson processes, it is not hard to conceive of situations giving rise to dependent Poisson processes. One of the simplest arises from applying the operation of random translation, according to a Markov kernel H(.lu) (as in Section 3.1), to an underlying Poisson process which is assumed to have mean measure #(.). Consider the input (underlying) Poisson process as one process and the output (randomly translated) process as the other; for example, all input points could be labelled with mark 1 and all output points with mark 2. Clearly, given an input point at u, the joint pgfl of the input and output arising from this point is
G[hl, h21ul= hi(u) fss h2(v)H(dv[u) ,

and the joint pgfl of the input and output given input {ui} is
(26)
H GLhl,h21uil = H
i
u:N({ })>0
(hl(u) fsh2(v)H(dv]u)).
(27)
Therefore, the unconditional joint pgfl of the input and output is
GIo[hl'h2]=EI(
u:N({.))>Ot
*hl(u) fh2@)H(d~)'u)})
Js (28)
-~ GiIhl(.) fsh2(v)H(dv,.)l .
If now it is supposed that the input process is Poisson with mean measure #(.) then
Gio[hl,h2] =-exp{ fs [hl(U) fs h2(v) H(dv,u) - llp(du) } ,
(29)
626 and this can be written as
R. K. Milne
Gio[hl,h;]
= exp{f
fs[hl(u)h2(v)-1]H(dv]u)#(du)}.
(30)
Observe that by setting h2(u) - 1 we recover the pgfl of the input process, setting hi(v) -= 1 yields the pgfl of the output process (as derived in Section 3.1), and setting hi = h2 = h gives the pgfl of the superposition, N~ + N2, of the input and output processes; these observations apply, in fact, to any joint pgfl. Since both the input and output processes are Poisson with respective mean measures #(-) and gH(B)= fsH(Blu)l~(du), the input and output processes corresponding to (30) provide one example of a two-type Poisson process. As can be seen from the following section, such a point process is infinitely divisible. The most general form of infinitely divisible two-type Poisson process [cf. Milne (1974)] has pgfl
G[hl,h2] = e x p { f s [hl(u) - l]yl(dU) + fs [h2(v)- l]#2(dv) + fs fs[hl(u)h~(v)- l]v(dudv)} .

(31)
This can be interpreted as being the superposition of three independent processes: a Poisson process with mean measure #1 (') contributing points with mark 1, a Poisson process with mean measure//2(-) contributing points with mark 2, and a further process which contributes pairs of points, one of each type, where this latter process is of the same type as (30). Other two-type Poisson processes are possible [cf. Griffiths and Milne (1978) and Brown et al. (1981)], though such processes cannot be infinitely divisible. Setting hi = h2 = h in (31) gives the pgfl of the superposition, N1 +N2, of the two processes: this has pgfl G[h]= exp{
fs [h(u) - l]t~(du) + fs fs [h(u)h(v) - l]v(dudv) } ,
(32)
where/~(-) = / ~ (.) + #2(')- The corresponding point process is the process studied in particular by Milne and Westcott (1972).
Gauss-Poisson
4.4. Infinitely divisible point processes

One of the simplest definitions of infinite divisibility is in terms of the pgfl: a point process is said to be infinitely divisible if for each positive integer n its pgfl G[h] can be expressed as G[h] = (G,[h]) n where G,,[h] is another pgfl. This is equivalent to being able to express the process, for each positive integer n, as an n-fold superposition of independent and identically distributed processes, each having pgfl Gn[h]. Observe that any Poisson cluster process is infinitely divisible. This is really a consequence of the infinite divisibility of any Poisson process: a Poisson cluster
627
process whose pgfl Go[h] is given by (21) can be seen to be infinitely divisible by the choice Gn[h] = e x p { fs(G[hlx ] - 1)n-1/z(dx)}
(33)
which is clearly the pgfl of a Poisson cluster process whose input Poisson process has mean measure n-l#(.). Poisson cluster processes constitute a large and important subclass of the class of infinitely divisible point processes. For a general infinitely divisible point process there is a representation theorem [see, for example, Theorem 8.4.V of Daley and Vere-Jones (1988)] giving a canonical form for its pgfl. This generalizes to point processes the classical result giving a compound Poisson representation for the probability generating function of an infinitely divisible nonnegative-valued random variable [cf. Section XII.2 of Feller (1968)]; the corresponding extension to random vectors was obtained by Dwass and Teicher (1957). In a sense, the representation for an infinitely divisible point process is a weaving together of the compound Poisson representation (as in Dwass and Teicher (1957)) for the (joint) probability generating function of each of the (necessarily infinitely divisible) finite-dimensional distributions of the process.
4.5. Some other processes related to point processes For providing language and a unifying framework within which other processes can be considered, the theory of point processes, and especially that of marked point processes, is often helpful. Semi-Markov processes with finitely many states can be viewed as multitype point processes [Cox and Isham (1980, Section 3.2)], although most properties of such processes can be conveniently derived without using this connection. Alternating renewal processes are a special case of semiMarkov processes in which there are two types of mark and two types of lifetime, each of which alternate over time. Shot noise processes [cf. Cox and Isham (1980, Section 5.6) or Snyder and Miller (1991, Chapter 4)] can also be conveniently viewed as marked point processes. In this case the mark attached to each point of an underlying point process is a possibly random multiple (usually independent and identically distributed for each point) of a fixed function; for example, the fixed function may be a negative exponential function with fixed decay parameter, representing a 'blip' of electric current associated with a typical point. The superposition (sum) of the all the 'blips' of current, which is clearly stochastic process and not a point process, is what is called a shot noise process. There is a connection between the joint characteristic function (Laplace transform) of the shot noise process at finitely many time points and the characteristic functional (Laplace functional) of the underlying point process [cf. Snyder and Miller (1991, Section 5.2.1)]. Shot noise processes and this connection are explored by Kingman (1993, Chapter 3) in the simplest setting, that is when the underlying point process is a Poisson process.
628
R. K. Milne
Marked point processes provide a framework for treating many other probability models of interest, especially in stochastic geometry and stereology. For example, consider a stochastic process whose realizations are a (finite or) countable number of line segments in the plane, with each segment specified by the random location (of its midpoint), together with its orientation and length. Such a process can be viewed as a marked point process in which the points of the underlying point process give the locations of the midpoints of the line segments, while the mark attached to each such point is a vector recording the orientation and length of the line segment to be associated with that (mid-)point. [(Stoyan and Stoyan (1994, p. 265) give an example involving positions and orientations of flies on a leaf.)] The simplest such model for a process of line segments assumes that the underlying point process is a homogeneous Poisson process in the plane. A related type of model can be built from an underlying homogeneous Poisson process in the plane by supposing that each point is independently marked with a positive number drawn from some specified distribution, the same for each point. Suppose now that each point of the underlying process is replaced by a disc centred at that point and with radius given by the associated mark. The union of all such discs then constitutes a realization of a stochastic process which is not itself a point process. Such a process is an example of Poisson grain model or Boolean model and a particular example of a random set process. Boolean models are studied in Molchanov (1997), Stoyan et al. (1995, Chapter 3) and Stoyan and Stoyan (1994, Appendix F); for more general discussion of random sets see Molchanov (1999) or Stoyan et al. (1995, Chapter 6). One application of Boolean models is to modelling the distribution of cells of differing size over a region; a variety of other applications are given in Stoyan et al. (1995, p. 62). It is sometimes of interest to consider a process of lines (infinite in length) rather than a process of line segments; see Kingman (1993, Chapter 7), Daley and Vere-Jones (1988, Section 10.6), or Stoyan et al. (1995, Chapter 8). Stoyan et al. (1995, Chapter 9) consider other generalizations, for example to processes of fibres.
4.6. Gibbs processes, Markov point processes and related processes
In earlier sections, it has been seen that Poisson processes are a fundamental class of point processes, both as processes in their own right and as a basic building block for constructing other classes of processes. This section briefly surveys a way of constructing new point processes that once again builds on a Poisson process, but provides an approach that is very different from any of the other approaches discussed in earlier sections. Broadly, the aim here is to define classes of point processes by means of their densities (Radon-Nikodym derivatives) with respect to a Poisson process. However, in a Euclidean state space this approach is inappropriate for even homogeneous Poisson processes, since such processes are mutually singular if they have different intensities; see Stoyan et al. (1995, Example 5.6). This problem can be circumvented if attention is restricted to a
629
bounded window W in the state space. In this setting a Gibbs (point) process is defined by its density, or likelihood ratio, relative to a homogeneous Poisson process of unit intensity; see Stoyan et al. (1995, Section 5.5). Such processes have their origins, and are widely used, in statistical mechanics. (The choice of intensity of the 'base' Poisson process is, in a sense, immaterial since on a bounded window W homogeneous Poisson processes of different intensity form a class of mutually absolutely continuous processes.) Gibbs processes form a large class of point processes, and it is usually necessary to make further assumptions about the structure of the densities in order to obtain more specific and useful classes of processes. One of the classes is that of Markov point processes [Ripley and Kelly (1977)]. These allow the introduction of a form of spatial dependence that is local or Markov [Diggle (1983, Section 4.9); Cressie (1991, Section 8.5.5); Stoyan et al. (1995, Section 5.5)] once a neighbourhood structure has been specified for the points in any realization through some reflexive and symmetric 'neighbourhood' relation defined for pairs of points. An example of such a relation is that for a specified finite r, two points xi and x/ are r-neighbours if and only if the Euclidean distance d(xi,x/) between them is at most r. For these processes a Hammersley-Clifford type theorem [Ripley and Kelly (1977)] shows that the densities (with respect to the unit Poisson process) are precisely of the form f ( x ) = H O(x') ,
X t CX
(34)
where x, and x / here denote particular realizations (i.e. sets of finitely many points from W), 0(.) is nonnegative, and 0(x') = 1 unless all pairs of points in x' are neighbours. Such Markov point processes are surveyed in Baddeley and Moller (1989). These authors introduce also multitype generalizations and a more general class of Markov point processes based on a different type of neighbourhood relation (where fixed-range interactions are replaced by interactions between points that are neighbours according to a relation that may depend on the realization); see Section 4 of Baddeley and Moiler (1989). A more manageable subclass of Markov point processes is the class ofpairwise interaction processes, whose densities have the form
S(=) =
l-I
+(xi,x+) ,
(35)
{x,,x/}C_x where c~is a normalizing constant,/3 > 0 is an 'intensity' parameter, n(x) denotes the number of points in the realization x, and (p(., .) is a symmetric nonnegative function, often termed an interaction function, satisfying ~o(xi,xi)= 1 and such that normalization is possible. Normalization is possible if range(~o) c_ [0, 1], corresponding to purely inhibitory processes, or if q0(xi,x/) = h(d(xi, x/)) where h is a function which is nonnegative, bounded and zero for values of its argument less than or equal to some specified value e. Processes of the latter type are called hard core processes.
630
R. K. Milne
A particular parametric subclass of the class of pairwise interaction processes is the family of Strauss processes [Strauss (1975)], which are those whose densities with respect to the unit Poisson process are of the form
f ( x ) = ct/~(~)7~(~) ,
(36)
where c~ and ]~ are as above, 7 is an 'interaction' parameter satisfying 0 _< 7 -< 1, and s(x) counts the number of pairs of points in the realization x which are r-neighbours (in the sense defined above). The density is not integrable, and normalization is not possible, if 7 > 1. The case 7 = 1 gives a Poisson process with intensity/~, and while the case 7 = 0 gives a hard core process in which Poisson realizations are conditioned to have no pairs of points that are r-neighbours. Cases with 0 < ~ < 1 yield processes exhibiting less strict inhibition, sometimes termed softcore processes. Strauss processes are arguably the simplest non-trivial Markov point processes. The densities of the Strauss family can be represented as a canonical exponential family [see, for example, Barndorff-Nielsen (1978, Section 8. I) and references cited there)]: the dominating measure for the representation is the Poisson process of unit intensity, the canonical parameter vector is [ln c~,in/~] and the canonical statistic is In(x), s(x)]. A similar representation is possible for the densities for a number of other families of Markov point processes. Because maximum likelihood estimation is in principle straightforward for canonical exponential families, inference for such Markov point processes should also be so. The difficulty is that the likelihood function for such a Markov point process family would usually involve the (parameter dependent) normalizing constant, for which an explicit closed form is generally impossible, in contrast to what happens for the common exponential families. When the normalizing constant cannot be obtained, the conventional approach to maximum likelihood estimation, based on solving the likelihood equation(s), is not possible. However, recent advances in computing power and statistical technology have made it feasible to avoid calculation of the normalizing constant and conventional maximum likelihood estimation using instead the approach known as Markov chain Monte Carlo, which involves large scale simulation. For a good coverage of these ideas see Geyer (1999). Another family of Markov point processes which has a representation as a canonical exponential family, is that consisting of so-called triplets processes; see Geyer (1999). Whereas Strauss processes can be obtained by a standard procedure for generating a canonical exponential family with specified canonical statistics, triplets processes can be generated in a similar way by using a further statistic w(x), counting the number of triples of points that are mutual r-neighbours in the realization x. The resultant exponential family has a three-dimensional canonical statistic In(x), t(x),w(x)]. Such processes can, like the Strauss process, be fitted by Markov chain Monte Carlo methods [cf. Geyer (1999)]. Similar comments apply to the area interaction processes introduced in Baddeley and van Lieshout (1995) and the saturation processes described in Geyer (1999, Section 3.9.2), although it should be noted that processes of the latter type are not Markov point processes.
631
Important in the simulation of the processes discussed in this section, and in Markov chain Monte Carlo methods of inference for such processes, is a class of spatio-temporal stochastic processes known as spatial birth and death processes [Stoyan et al. (1995, Section 5.5.5), Baddeley and Moller (1989)]. Such a process is a continuous-time pure jump Markov process whose state space is the set of all possible realizations of point processes on W (that is, all finite subsets of W which is assumed, as above, to be a bounded subset of a Euclidean space), and whose only possible transitions are either the 'birth' of a new point, or the 'death' of a point in the preceeding point process realization. (Note that a spatial birth and death process is Markov as regards time.) The essence of the connection with simulation is that, under certain conditions, the limiting distribution of a spatial birth and death process is a Markov point process as introduced above. A consequence of this is that realizations of such a Markov point process can be generated as observations on the relevant spatial birth and death process after it has been running for a long time [Geyer (1999), Moller (1999)].
5. Statistical inference
5.1. Introductory remarks The area of statistical inference for point processes is a difficult one which has been the subject of much recent growth. The remarks that follow will serve as some introduction. Any statistical analysis of point process data should be backed by suitable graphical displays. A plot of the point process realization should be included where this is feasible. In itself, this plot may suggest some form of interaction between points. For example, there may be a tendency to clustering (in a biological application, perhaps as a result of local reproduction), or to inhibition (possibly arising from competition for space or nutrients). When there is inhibition with a minimum permissible distance between points and a sufficiently high intensity, a tendency to regularity may be observed. Other plots of data summaries are possible. Some of these may play a purely descriptive or summary role; others may be relevant in fitting particular point process models, or assessing the goodness-of-fit of such models. Any approach should be driven primarily by the needs of the person who collected the data. Brillinger (1994) gives an interesting review of techniques for statistical analysis of time series and point processes; connections are drawn between the two areas and the techniques illustrated by real data.
5.2. Estimation of the intensity function Suppose that the data is a partial realization of a stationary point process and that, for example, only those points within a bounded window W can be observed. The prime interest may then lie in estimation of the intensity. In a forestry
632
R. K. Milne
application, this may give valuable information, for example on the overall quantity of wood in the forest. However, in such an application it may be necessary to use a more complex model: one that introduces supplementary information on the sizes of individual trees, represented as a mark (Section 4.2) attached to each point in the point process, may lead to better information on the overall quantity of wood. If stationarity does not seem a reasonable assumption, it may be of interest to estimate the intensity function of the process. Here non-parametric kernel density estimation techniques could be used. Alternatively, based on an inhomogeneous Poisson process model a specific parametric form could be fitted for the intensity function. These approaches are discussed, for example, in Cressie (1991, Sections 8.2.4 and 8.5.1) and in Stoyan and Stoyan (1994, Section 13.3). Graphical displays in the form of a plot or contour plot of the estimated intensity function could be provided, respectively, for real line or planar data.
5.3. Nearest neighbour distributions and the K function

Consider now data which is a partial realization of a stationary isotropic (planar) point process. In this setting three other functions are often considered [see Diggle (1983, Chapter 2) or Cressie (1991, Sections 8.2.6 and 8.4)]. One of these functions is the so-called empty space function, defined by F(r) = Pr(d(u, x) _< r), r > 0. This is the distribution function of the distance d(u, x) from an arbitrary point u in S to the nearest point of the process. Another function is G(r) = Pr(d(x,x\{x})< r), r > 0, the distribution function of the distance d(x, x\{x}) from an arbitrary point x of x to the nearest other point of the process x. This is the nearest-neighbour distribution function of the process. Finally, there is the reduced second moment function, or (Ripley)K-function [cf. Ripley (1981)], which can be defined for r > 0 by
K(r) = #-iX(number of further points of x within distance r

of an arbitrary point of x) , where # is the intensity of the process. To formally define the latter two functions requires consideration of the Palm distribution of the process; the expectation defining the K-function is in fact an expectation with respect to the (reduced) Palm distribution of the process [cf. Stoyan etal. (1995, Sections2.4.1 and 2.4.3)]. For a homogeneous planar Poisson process with intensity 2 it can be shown [cf. Stoyan et al. (1995, Sections 4.4 and 4.5)] that F(r) = G(r) = 1 - exp { - 2 ~ r 2} and K(r) = ~zr2, r > 0. For a clustered point process F(r), for small r, will be less than the corresponding value for a homogeneous Poisson process, and for values of r close to the range of clustering G(r) and K(r) will each be greater than the corresponding Poisson value. For a point process showing inhibition, at values of r larger than the range of inhibition F(r) will exceed the homogeneous Poisson
Pointprocessesand some relatedprocesses
633
equivalent, and for values of r close to the range of inhibition G(r) and K(r) will each be less than the corresponding Poisson value. Since F = G for a homogeneous Poisson process, various proposals for assessing Poissonness of a given point process are based on comparing F with G. Reflecting a basic property of a homogeneous Poisson process some authors would speak here of assessing complete spatial randomness (often abbreviated to CSR); see, for example, Diggle (1983) or Cressie (1991). In this spirit, Diggle (1979) considered the statistic supr IF(r)- G(r)]. Moreover, van Lieshout and Baddeley (1996) considered the function J(r) = [1 - G(r)]/[1 - F(r)], defined for r such that F(r) < 1, suggesting it as a useful summary measure to indicate the strength and range of interpoint interactions in a point process. For a homogeneous planar Poisson process with intensity 2 it is clear that J(r) - 1. Furthermore, J(r) < 1 indicates clustering and J(r) > 1 inhibition or regularity, while for many point processes J(r) is constant for r beyond the range of spatial interaction. The remarks of the preceding paragraphs refer to the 'true' functions being considered, whereas in practice they would usually be estimated from some point process realization. Estimation of the functions F, G and K on the basis of the points in a bounded window raises special problems of edge effects, which have been discussed in detail by Baddeley (1999b). There are two main types of edge effects: sampling bias that is size dependent and related to the well-known problem of length-biased sampling (for example, widely separated nearest neighbour pairs are less likely to be represented in a fixed bounded sampling window), and censoring effects (which arise, for example, because the nearest point to a given point inside the window may be outside the window and therefore unobserved). Ripley (1988) and Baddeley (1999b) have discussed ways of dealing with these effects. For example, extensions of Campbell's theorem play a key role in assessing bias in the estimation of F, G and K. It is possible to plot separately estimates F, G a n d / ( , together with their respective Poisson equivalents based on the estimated intensity. These plots can be used to make an assessment of fit of a homogeneous Poisson process model. Such assessment may be assisted by use of Monte Carlo tests (cf. Diggle, 1983). Here, for example based on F, one would simulate 99 independent realizations from the homogeneous Poisson model with the estimated intensity, and then construct the upper and lower envelopes for F
UF(r) = max~.(r)
LF(r) = m inF/(r) ,
where the maximum and minimum are taken over the estimates F / o f F from each of the 99 simulations. The functions UF and LF are then plotted with F and its Poisson equivalent. To the extent that F lies between UF and Le the fit of the Poisson model is regarded as acceptable. (Notice that, whilst (LF(r), UF(r)) gives a 98% confidence interval for F(r) for any specified value of r, it cannot be asserted that the same confidence coefficient applies when all values of r in some interval are considered this is a problem of simultaneous confidence intervals.)
634
R. K. Milne
A similar approach based on G, K or J could be used; the different functions each embody somewhat different information from the others, and so the plots should complement one another. It is recommended that attention should not be restricted to just one of the functions. Variations are possible on the plots discussed above. For example, one could use a probability plot of P-P type where F(r) is plotted against the corresponding Poisson equivalent ['Poi(r) for each r and, on the same plot, add the pairs (L F (r), [rPoi(r)) and (UF (r), lVPoi(r)) to give corresponding envelope functions. In the case of the K-function it has been found useful to plot either K(r) - ~zr2 against r, or ~/K(r)/rc - r against r; see, for example, Diggle (1983) or Cressie (1991). Any deviations from Poissonness that may be shown in the plots considered above provide clues as to what type of non-Poisson model may be appropriate. A more detailed description of any observed clustering or inhibition could then be attempted by the formulation and fitting of a more complex model [cf. Cressie (1991, Section 8.5)]: for example, the choice might initially be some simple Poisson cluster process or a Strauss process. Choosing and fitting a suitable model may not be entirely simple matters and it is likely that a statistician or applied probabilist knowledgeable about point processes would need to be involved, at least at this stage. The parameters of a reasonably fitting model provide a summary of the original data: for a homogeneous Poisson process this summary would involve just the intensity; for a Strauss process, as described in the previous section, the parameter fi is related to the intensity of the process, while 7 describes interactions between neighbouring points. Since the case ? = 1 yields a Poisson process, it is in principle possible to assess Poissonness parametrically within the family of Strauss processes by testing the hypothesis that 7 = 1.
5.4. Likelihood based inference

One of the problems that has impeded development of inference for point process models is the difficulty, and in most cases impossibility, of writing down an expression for the likelihood function. A notable exception is that the likelihood can be written down explicitly for a realization over a fixed time interval (0, T) of any (inhomogeneous) Poisson process on R; see Snyder and Miller (1991) or Kutoyants (1998). As indicated in Section 4.6, there is much current interest in the use of Markov chain Monte Carlo methods; see, for example, Geyer (1999) and Moller (1999). These are sophisticated simulation intensive methods which enable likelihood based inference to be implemented for parametric point process models even when a likelihood function cannot be written down explicitly. There is also the possibility of using pseudo-likelihood methods, in which the likelihood function is replaced by another closely related function which is then used as if it were the likelihood. Such methods grew from work of Besag (1974, 1978); the monograph by S~irkk~i (1993) provides a good review of these ideas
635
and some new applications; see also Jensen and Moller (1991), Sfirkkfi (1995), Goulard et al. (1996) and Baddeley and Turner (2000).
5.5. Inference f r o m muItitype point process data
Statistical inference from multitype (multivariate) point process data is less well developed than inference for a single point process. Brillinger (1976), Lotwick and Silverman (1982), Chapters 6 and 7 of Diggle (1983) and Section 8.6 of Cressie (1991) deal with some basic ideas. These include estimation of cross-type versions of the functions G, K and J; see especially van Lieshout and Baddeley (1996) where attention is focused on multitype extensions of the J function. Goulard et al. (1996) have considered maximum pseudo-likelihood estimation for marked Gibbs processes, and so in particular for multitype Gibbs processes.
6. Simulation
It is straightforward to simulate a homogeneous Poisson process using its representation as a mixed Bernoulli process, as described in Section 4.1. Such a simulation can be effected in a window W of odd shape by initially simulating on a larger set, for example a rectangle, of more regular shape and rejecting those points which do not fall within W. Whilst the mixed Bernoulli approach can be used for any state space, on the real line there is also the possibility of simulating a homogeneous Poisson process by generating a sequence of 'intervals' from a suitable exponential distribution. An extension of the latter technique allowing non-exponential distributions facilitates simulation of renewal processes. For simulating an inhomogeneous Poisson process Lewis and Shedler (1979) gave a simple technique based on thinning a realization of a suitable homogeneous Poisson process. The only requirement, which would surely be met in most practical circumstances, is that the process to be simulated have an intensity function which can be bounded above by some fixed constant over the window W on which the realization will be obtained. Suppose then that it is desired to generate a realization on W of an inhomogeneous Poisson process with intensity function 2(-) where, for some constant 2*, 2(u) _< 2" for all u E W. First generate a realization of a homogeneous Poisson process on W with intensity 2*. Then delete points in this realization independently according to the following procedure: for a point at u, delete this point with probability 1 - p ( u ) and retain it with probability p(u), where p(u) = 2(u)/2". That the points of the original (homogeneous Poisson) realization which remain after this deletion process form a realization of an inhomogeneous Poisson process with intensity 2(.) is clear from the discussion near (17). The technique is relatively more efficient if the proportion of points retained is high; for a given 2(t), maximum possible efficiency is achieved when sup 2(u) = )~*, where the supremum is taken over all relevant u c W. Variations on this 'thinning' (deletion) method, and refinements which can be used to improve its efficiency, are discussed by Lewis and Shedler (1979). The thinning
636
R. K. Milne
method can be used in particular to simulate a Poisson process with cyclic intensity function. In principle a Gibbs process having a specified density (with respect to the Poisson process of unit intensity) which is a bounded function can be simulated by a rejection technique as described, for example, in Stoyan et al. (1995, Section 5.5.2); however, this may not provide an efficient approach. For efficient simulation of a Gibbs process, and in particular a Markov point process, the Markov chain Monte Carlo methods mentioned in Section 4.6 are often preferable. A realization of such a process in a bounded window can be obtained by generating an observation on a suitable spatial birth and death process after it has been running for a long time. Stoyan et al. (1995, Section 5.5.5) has an introduction to these ideas; for more detailed discussion see Geyer (1999) and Moller (1999).
7. Concluding remarks
7.1. Martingale theory of point processes

A different approach to point processes is needed for dealing with processes, such as arise in the study of queueing or communication systems or in survival analysis, which evolve dynamically over time and in a manner that may depend on the past history of the process. For a point process N~, where t represents time and Nt = N((0, t]) in our previous notation, the stochastic intensity function 2(tlg4~t_) might be defined heuristically by 2 ( t l ~ , _ ) d t = Pr(N(dt) = l12/{t_ ) , where 24~t_ is a history of N~ up to but not including time t. Using the mathematically well-developed theory of martingales, the martingale theory of point processes provides an approach to formalizing the notion of a stochastic intensity and to solving a wide variety of problems by means of the stochastic calculus that results. It is beyond the scope of the present article to enter into details of this extensive and technically rather difficult theory. Key references are Br6maud (1981), Karr (1991), Snyder and Miller (1991) and Andersen et al. (1993), the latter comprehensive work being focused on applications to survival analysis. Aalen (1997) is interesting for an overview and some historical comments on the development of the field.
7.2. Omissions and further bibliographic comments

Aside from not discussing any detail of the martingale theory of point processes, this article has at least two other major omissions: for reasons of time and space it has not attempted any serious presentation of either Palm theory or convergence results for point processes and random measures. Both these areas are well covered in Daley and Vere-Jones (1988), Matthes et al. (1978) and Kallenberg (1983). There has also been no discussion of spectral theory for point processes
637
[cf. Daley and Vere-Jones (1988, Chapter 11)] or of applications of point process theory to the study of statistics of extremes [cf. Resnick (1987)]. Daley and Milne (1973) provided an comprehensive annotated bibliography which may still be useful despite the later explosive growth in the area. The extensive references in, for example, Daley and Vere-Jones (1988) and Karr (1991) offer good updates. Several other books which have appeared in the last twenty years (see Section 1 and the further comments in the present section) have more specialized bibliographies. Those of the various chapters in the collection Barndorff-Nielsen et al. (1999) are worthy of particular mention. The encyclopedia articles Karr (1986, 1988) and Milne (1999) provide a compact summary of many key point process ideas. For those interested primarily in spatial data, especially in biological applications, Diggle (1983) is highly recommended and Mat6rn (1960, 1986) may prove useful. The examples in parts of Cressie (1991), Ripley (1981, 1988) and Stoyan and Stoyan (1994) are also good, and in all these books there is some discussion of theory. There is a wide-ranging review of spatial processes, including point processes, in the paper by Hjort and Omre (1994). Guttorp (1995, Chapter 5 ) offers a succint overview of the theory of point processes and is well motivated by examples. The monograph Kingman (1993) is a masterly survey of the many beautiful properties and applications of Poisson processes and a good introduction to many central aspects of general point process theory. More detailed exposition of various aspects of the theory can be found in Daley and Vere-Jones (1972), Srinivasan (1974), Cox and Isham (1980) and Reiss (1993). Daley and Vere-Jones (1988) give a much more comprehensive, yet readable, presentation of the mathematical theory. A systematic and careful development of the mathematical foundations of point process theory can be found in Matthes, Kerstan and Mecke (1978), though this work is usually considered difficult, even by probabilists. Point process theory, with a view to applications in stochastic geometry, is dealt with in Ambartzumian (1990), Stoyan et al. (1995), Stoyan and Stoyan (1994) and many of the chapters in Barndorff-Nielsen et al. (1999). A good introduction to stochastic geometry, including its connections with point process theory, is provided in Baddeley (1999a). For those interested especially in statistical inference from point process data Diggle (1983), Cressie (1991), Ripley (1981, 1988) and Stoyan and Stoyan (1994) contain examples as well as an introduction to relevant theory; in particular Chapters 6 and 7 of Diggle (1983) and Section 8.6 of Cressie (1991) deal with aspects of inference for multitype point process data.
Acknowledgements
I am very grateful to Yih Chong Chin and a referee for reading an earlier version of the paper and offering a number of suggestions which have led to improvements.
638
R. K. Milne
References
Aalen, O. O. (1997). Counting processes and dynamic modelling. In Festschriftfor Lucien Le Cam. Research Papers in Probability and Statistics (Eds., D. Pollard, E. Torgersen and G. L. Yang), pp. 1 12. Springer-Verlag, New York. Ambartzumian, R. V. (1990). Factorization Calculus and Geometric Probability. Encyclopedia of Mathematics and its Applications, Vol. 33. Cambridge University Press, Cambridge UK. Andersen, P. K., 0. Borgan, R. D. Gilt and N. Keiding (1993). Statistical Models Based on Counting Processes. Springer-Verlag, New York. Baccelli, F. and P. Br6maud (1987). Palm Probabilities and Stationary Queues. Lecture Notes in Statistics, Vol. 41. Springer-Verlag, Berlin. Baddeley, A. J. (1999a). A crash course in stochastic geometry. In Barndorff Nielsen, Kendall and van Lieshout (1999), pp. 1-35. Baddeley, A. J. (1999b). Spatial sampling and censoring. In Barndorff Nielsen, Kendall and van Lieshout (1999), pp. 37-78. Baddeley, A. J. and J. Moller (1989). Nearest-neighbour Markov point processes and random sets. Int. Statist. Rev. 57, 89-121. Baddeley, A. J. and R. Turner (2000). Practical maximum pseudolikelihood for spatial point patterns. Austral. & New Zealand J. Statist. 42, to appear. Baddeley, A. J. and M. N. M. van Lieshout (1995). Area-interaction point processes. Ann. Inst. Statist. Math. 47, 601-619. Barndorff-Niclsen, O. (1978). Information and Exponential Families in Statistical Theory. John Wiley and Sons, Chichester UK. Barndorff Nielsen, O. E., W. S. Kendall and M. N. M. van Lieshout (eds.) (1999). Stochastic Geometry: Likelihood and Computation. Monographs on Statistics and Applied Probability, Vol. 80. Chapman and Hall/CRC, Boca Raton. Besag, J. E. (1974). Spatial interaction and the statistical analysis of lattice systems. (With discussion.) J. Roy. Statist. Soc. Ser. B 36, 192-236. Besag, J. E. (1978). Some methods for statistical analysis of spatial data. Bull. Int. Statist. Inst. 47(2), 77-92. Brandt, A., P. Franken and B. Lisek (t990). Stationary Stochastic Models. John Wiley and Sons, Chichester. Br6maud, P. (1981). Point Processes and Queues. Martingale Dynamics. Springer-Verlag, New York. Brillinger, D. R. (1975). Statistical inference for stationary point processes. In Stochastic Processes and Related Topics (Ed., M. L. Puri), pp. 55-99. Academic Press, New York. Brillinger, D. R. (1976). Estimation of second-order intensities of a bivariate stationary point process. J. Roy. Statist. Soc. Ser. B 38, 60 66. Brillinger, D. R. (1994). Time series, point processes and hybrids. Canad. J. Statist. 22, 177-206. Brown, T. C., B. W. Silverman and R. K. Milne (1981). A class of two-type point processes. Zeit. fit? Wahrscheinliehkeitstheorie verw. Gebiete 58, 299 308. Cox, D. R. (1962). Renewal Theory. Methuen, London. Cox, D. R. (1972). The statistical analysis of dependencies in point processes. In Lewis (1972a), pp. 55-66. Cox, D. R. and V. Isham (1980). Point Processes. Chapman and Hall, London. Cox, D. R. and P. A. W. Lewis (1966). Statistical Analysis of Series ~?f Events. Methuen (now Chapman and Hall), London. Cram&, H. and M. R. Leadbctter (1967). Stationary and Related Processes. Sample Function Properties and Their Applications. John Wiley and Sons, New York. Cressie, N. A. (1991). Statistics for Spatial Data. John Wiley and Sons, New York. Daley, D. J. and R. K. Milne (1973). The theory of point processes: a bibliography. Int. Statist. Rev. 41, 183-201. Daley, D. J. and D. Vere-Jones (1972). A summary of the theory of point processes. In Lewis (1972a), pp. 299-383.
639
Daley, D. J. and D. Vere-Jones (1988). An Introduction to the Theory of Point Processes. SpringerVerlag, New York. Diggle, P. J. (1979). On parameter estimation and goodness-of-fit testing for spatial point patterns. Biometrics 35, 87 101. Diggle, P. J. (1983). Statistical Analysis of Spatial Point Patterns. Academic Press, London. Diggle, P. J. and R, K. Milne (1983a). Negative binomial quadrat counts and point processes. Scand. J. Statist. 10, 257-267. Diggle, P. J. and R. K. Milne (1983b). Bivariate Cox processes: Some models for bivariate spatial point patterns. J. Roy. Statist. Soc. Ser. B 45, 11 21. Disney, R. L. and P. C. Kiesster (1987). Traffic Processes in Queueing Networks: A Markov Renewal Approach. Johns Hopkins, Baltimore MD. Dwass, M. and H. Teicher (1957). On infinitely divisible random vectors. Ann. Math. Statist. 28, 461470. Feller, W. (1968). An Introduetion to Probability Theory and Its Applications. Vol. I. (3rd edn) John Wiley and Sons, New York. Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1,209-230. Franken, P., D. KSnig, U. Arndt and V. Schmidt (1982). Queues andPoint Processes. John Wiley and Sons, Chichester. Geyer, C. (1999). Likelihood inference for spatial point processes. In Barndorff-Nielsen, Kendall and van Lieshout (1999), pp. 79 140. Gikhman, I. I. and A. V. Skorokhod (1969). Introduction to the Theory of Random Processes. W.B. Saunders Company, Philadelphia PA. (Translated from the 1st Russian edition: Nauka Press, Moscow, 1965). Gnedenko, B. V. and I. N. Kovalenko (1989). Introduction to Queueing Theory. Birkauser, Boston MA. [lst Russian edition: Nauka Press, Moscow, 1966; translated 1968]. Goulard, M., A. Sfirkk~ and P. Grabarnik (1996). Parameter estimation for marked Gibbs point processes through the maximum pseudo-likelihood method. Scand. J. Statist. 23, 365-379. Grandell, J. (1976). Doubly Stochastic Poisson Processes. Lecture Notes in Mathematics, 529. Springer-Verlag, Berlin. Grandell, J. (1997). Mixed Poisson Processes. (Monographs on Statistics and Applied Probability, 77.) Chapman and Hall, London. Griffiths, R. C. and R. K. Milne (1978). A class of bivariate Poisson processes. J. Multivariate Anal. 8, 380-395. Guttorp, P. (1995). Stochastic Modelling of Scientific Data. Chapman and Hall, London. Harris, T. E. (1963). The Theory of Branching Processes. Springer-Verlag, Berlin. Hjort, N. L. and H. Omre (1994). Topics in spatial statistics. Scand. J. Statist. 21, 289 357. Hunter, J. J. (1974a). Renewal theory in two dimensions: basic results. Adv. Appl. Probab. 6, 376-391. Hunter, J. J. (1974b). Renewal theory in two dimensions: asymptotic theory. Adv. Appl. Probab. 6, 546-562. Jensen, J. L. and J. Moller (1991). Pseudo-likefihood estimation for exponential family models of spatial point processes. Ann. Appl. Probab. 3, 445461. Jif-ina, M. (1964). Branching processes with measure-valued states. In Transactions of the Third Prague Conference on Information Theory, Statistical Decision Functions, Random Processes. Czechoslovak Academy of Sciences, pp. 333-357. Kallenberg, O. (1983). Random Measures. (3rd edn.) Akademie-Verlag, Berlin. Karr, A. F. (1986), Article "Point process, stationary'. In Encyclopedia of Statistical Sciences (Eds., S. Kotz and N. L. Johnson), Vol. 7, pp,15 19. John Wiley and Sons, New York. Karr, A. F. (1988). Article "Stochasticproeesses, point'. In Eneyelopedia of Statistical Sciences (Eds., S. Kotz and N. L. Johnson), Vol. 8, pp, 852-859. John Wiley and Sons, New York. Karr, A. F. (1991). Point Processes and Their Statistical Inference. (2nd edn.) Marcel Dekker, New York. (lst edn., 1986). Kendall, D. G. (1964). Some recent work and further problems in the theory of queues. Theory Probab. Appl. 9, 1-12.
640
R. K. Milne
Kerstan, J., K. Matthes and J. Mecke (I974). Unbegrenzt teilbare Punktprozesse. Akademie-Verlag, Berlin. Khinchin, A. Ya. (1969). Mathematical Methods in the Theory of Queueing. (2nd edn.) Griffin, London. [lst Russian edn., 1955; translated, 1960]. Kingman, J. F. C. (1993). Poisson Processes. Clarendon Press, Oxford. K6nig, D., K. G. Matthes and K. Nawrotzki (1967). Verallgemeinerung der Erlangschen und Engsetschen Formeln (Eine Methode in der Bedienungstheorie.) Akademie-Verlag, Berlin. Kutoyants, Yu. A. (1998). Statistical Inference for Spatial Poisson Processes. Lecture Notes in Statistics, Vol. 134. Springer-Verlag, New York. Lewis, P. A. W. (1972a). Stochastic Point Processes: Statistical Analysis, Theory and Applications. Wiley-Interscience, New York. Lewis, P. A, W. (1972b). Recent results in the statistical analysis of univariate point processes. In Lewis (1972a), pp. 1-54. Lewis, P. A. W. and G. S. Shedler (1979). Simulation of non-homogeneous Poisson processes by thinning. Naval Res. Logistics Quart. 26, 403413. Lotwick, H. W. and B. W. Silverman (1982). Methods for analysing spatial processes of several types of points. J. Roy. Statist. Soe. Ser. B 44, 406M13. Macchi, O. (1975). The coincidence approach to stochastic point processes. Adv. Appl. Probab. 7, 83-122. Mat~rn, B. (1960). Spatial Variation: Stochastic Models and Their Application to Some Problems in Forest Surveys and Other Sampling Investigations. Meddelanden fran Statens Skogsforskningsinstitut 49, nr 5, 1-144. Mat~rn, B. (1986). Spatial Variation. (2nd ed.) Lecture Notes in Statistics, 36. Springer-Verlag, Berlin. Matthes, K., J. Kerstan and J. Mecke (1978). Infinitely Divisible Point Processes. John Wiley and Sons, Chichester. Milne, R. K. (1971). Simple proofs of some theorems on point processes. Ann. Math. Statist. 42, 368 372. Milne, R. K. (1974). Infinitely divisible bivariate Poisson processes. (Abstract) Adv. Appl. Probab. 6, 226-227. Milne, R. K. (1998). Article on 'Point Processes' In Encyclopedia ofBiostatistics (Eds., P. Armitage and T. Colton), John Wiley and Sons, Chichester. Vol. 4, pp. 3385-3398. Milne, R. K. and M. Westcott (1972). Further results for Gauss-Poisson processes. Adv. Appl. Probab. 4, 151-176. Molchanov, I. S. (1997). Statistics of the Boolean Model for Practitioners and Mathematicians. John Wiley and Sons, Chichester. Molchanov, I. S. (1999). Random closed sets: results and problems. In Barndorff Nielsen, Kendall and van Lieshout (1999), pp. 285-331. Moller, J. (1999) Markov chain Monte Carlo and spatial point processes. In Stochastic Geomentry: Likelihood and Computation Barndorff Nielsen, Kendall and van Lieshout (1999), pp. 141 172. M6nch, G. (1971). Verallgemeinerung eines Satzes yon A. R6nyi. Stud Sci. Math. Hung. 6, 81 90. Moyal, J. E. (1962). The general theory of stochastic population processes. Acta Math. 108, 1 31. Nawrotzki, K. (1962). Eine Grenzwertsatz ffir homogene zuf/illige Punktfolgen. Math. Nachr. 24, 201~17. Neveu, J. (1977). Processus Ponctuels. In Lecture Notes in Mathematics. 598, 249-447. SpringerVerlag, Berlin. Palm, C. (1943). Intensitfitsschwankungen in Fernsprechverkehr. Ericsson Teehniks 44, 1-189. Prohorov, Yu. V. and Yu. A. Rozanov (1969). Probability Theory: Basic Concepts, Limit Theorems, Random Processes. Springer-Verlag, Berlin. Reiss, R.-D. (1993). A Course on Point Processes. Springer-Verlag, New York. R~nyi, A. (1967). Remarks on the Poisson process. Stud. Sci. Math. Hung. 2, 119-123. Resnick, S. (1987). Extreme Values, Regular Variation, and Point Processes. Springer-Verlag, New York. Ripley, B. D. (1981). Spatial Statistics. John Wiley and Sons, New York.
641
Ripley, B. D. (1988). Statistical Inference for Spatial Processes. Cambridge University Press, Cambridge UK. Ripley, B. D. and F. P. Kelly (1977). Markov point processes. J. Lond. Math. Soc. 15, 188-192. Ryd6n, T. (1995). Consistent and asymptotically normal parameter estimates for Markov modulated Poisson processes. Scand. J. Statist. 22, 295-303. S~irkk~, A. (1993). Pseudo-likelihood Approach for Pair Potential Estimation of Gibbs Processes. University of Jyv~iskyl~ Studies in Computer Science, Economics and Statistics, Vol. 22. University of JyvfiskylS, Jyv~skyl~i. S~irkk~, A. (1995), Pseudo-likelihood approach for Gibbs point processes in connection with field obselwations. Statistics 26, 89-97. Sewastjanov, B. A. (1975). Verzweigungsprozesse. R. Oldenbourg-Verlag, M~nchen. (Original Russian edition: Nauka Press, Moscow, 1971.). Shorrock, R. W. (1975). Extremal processes and random measures. J. Appl. Probab. 12, 316-323. Sigman, K. (1995). Stationary Marked Point Processes. An Intuitive Approach. Chapman and Hall, New York. Snyder, D. L. and M. I. Miller (1991). Random Point Processes in Time and Space. 2nd edn. SpringerVerlag, New York. (1st ed., Snyder only: John Wiley and Sons, New York, 1975). Srinivasan, S. K. (1974). Stochastic Point Processes. Griffin, London. Stoyan, D., W. S. Kendall and J. Mecke (1995). Stochastic Geometry and Its Applications. 2nd ed. John Wiley and Sons, Chichester. (1st ed. 1987, joint with Akademie Verlag, Berlin). Stoyan, D. and H. Stoyan (1994). Fractals, Random Shapes and Point Fields." Methods of Geometrical Statistics. John Wiley and Sons, Chichester. Strauss, D. J. (1975). A model for clustering. Biometrika 63, 467 475. Thorisson, H. (1995). On time- and cycle-stationarity. Stoch. Proc. Appl. 55, 185-209. van Lieshout, M. N. M. and A. J. Baddeley (1996). A nonparametric measure of spatial interaction in point patterns. Statist. Neerland. 50, 344-361. van Lieshout, M. N. M. and A. J. Baddeley (1999). Indices of dependence between types in multivariate point patterns. Scand. J. Statist. (to appear). Westcott, M. (1972). The probability generating functional. Austral. J. Math. 14, 448466. Williams, D. (1979). Diffusions, Markov Processes and Martingales. Vol. 1. Foundations. John Wiley and Sons, Chichester.
l Q
1 .,/
Characterization and Identifiability for Stochastic Processes
B. L. S. Prakasa Rao
1. Introduction
The problem of identifiability is basic to all the statistical methods and data analysis and it occurs in diverse areas such as reliability theory, survival analysis, econometrics etc. where stochastic modelling is widely used. In many fields, the objective of the investigator's interest is not just the probability distribution of an observable random variable but the physical structure or model leading to the probability distribution or the observed data structure. Identification problems arise when the observations can be explained in terms of one or several stochastic models. Since stochastic processes are widely used for stochastic modelling purposes, it is of great importance to know the conditions under which different types of stochastic processes are identifiable. Mathematics dealing with the problem of identifiability per se is closely related to the so-called branch of characterization problems in probability theory. Summarization of statistical data without losing information is one of the fundamental objectives of statistical analysis. More precisely, the problem is to determine whether the knowledge of a possibly smaller set of functions of several random components is sufficient to determine the behaviour of a larger set of individual random components. Here the problem of identifiability consists in identifying the component distributions from the joint distributions of some functions of them. Our aim is to give a review of various results characterizing or identifying different types of stochastic processes. No proofs of these results are given and the mathematics involved in proving these results consists mostly the theory of functional equations and the theory of differential equations. A survey of identifiability in stochastic models, in particular characterization of some stochastic processes, is given in Prakasa Rao (1992). Section 2 gives a survey of characterizations of homogeneous processes with independent increments by the properties of certain stochastic integrals defined by them. Characterization results based on martingales involving the processes or conditional structures based on them are discussed in Section 3. Section 4 contains characterization results for Poisson process as a renewal process and as a point process. Additional results
643
644
characterizing different types of stochastic processes which possibly do not fit into the themes of the earlier sections are discussed in Section 5.
2. Characterizations by properties of stochastic integrals of processes with independent increments

We now give a review of the work in the area of identifiability or characterization of stochastic processes with independent increments by stochastic integrals. For earlier reviews, see Laha and Lukacs (1965), Lukacs (1970a), Prakasa Rao (1983) and more recently Ramachandran and Lau (1991) and Prakasa Rao (1992). Chapter 7 of Lukacs and Laha (1964) deals with such results. Rao and Shanbhag (1994) discuss some applications Choquet-Deny type functional equations for characterizations of stochastic processes in Chapter 8 of their book.
2.1. Stochastic integrals

We first define a stochastic integral of a function with respect to a process with independent increments. Let T = [A, B l, A stochastic process {X(t), t c T} is said to be a homogeneous process with independent increments if the distribution of the increment X(t + h) - X(t), t, t + h c T depends only on h but not on t and if the increments over non overlapping intervals are stochastically independent. The process is said to be continuous in probability ifX(t) converges in probability to X(s) as t tends to s for every s c T. We shall consider only such processes through out this section. If O(u; h) denotes the characteristic function of X ( t + h) - X(t), t, t + h E T, it is well known that O(u; h) is an infinitely divisible characteristic function [cf. Lukacs (1975)]. In fact O(u; h) = [0(u; 1)]h for all h > 0. The process {X(t)} is uniquely determined by the function O(u; 1) and the distribution of X(0). Let b and w be some functions defined on [A,B] c T = [0, ~ ) and suppose that w is nonnegative, nondecreasing and left continuous on [A,B]. Let
D n : A = tn, 0 < tn, l < "'" < tn,n = B~ n ~ 1
be a sequence of subdivisions of the interval [A, B] such that

n~
lira m a x ( t , ~ - tnk-1) = 0 .
l <k<n ' '
Select t,~,k E [tn,k-1, t,,~] and construct the sum

s. =
k=l
b(G)
- X(w(t.,k_l))]
If the sequence Sn converges in probability to a random variable S and if this limit is independent of the choice of the subdivision and the points t~,k, then we say that the stochastic integral S exists in probability and it is denoted by
Characterization and identifiabilityfor stochasticprocesses
645
b(t)dX(w(t))
If the limit exists in quadratic mean, then the integral is said to exist in quadratic mean. These type of stochastic integrals were studied in Riedel (1980a). Ramachandran and Rao (1970) [cf. Kagan et al. (1973), Chapter 13] discussed similar integrals under slightly different conditions. Suppose w(t) =_ t. The following results are known about the existence of such integrals. THEOREM 2.1. Let {X(t),t > 0} be a homogeneous process with independent increments continuous in probability. Suppose that the process has finite mean function and finite covariance function both of bounded variation on a finite closed interval [A, B 1. Further suppose that 9 is a real-valued and continuous function defined on [A, B 1. Then the stochastic integral
"g(t)dX(t)
exists in quadratic mean. THEOREM 2.2. Let {X(t),t >_ 0} be a homogeneous process with independent increments continuous in probability and g be a real-valued and continuous function defined on [A, B]. Then the stochastic integral
AB g( t)dX ( t )
exists in probability. For proofs of Theorems 2.1 and 2.2, see Lukacs (1975). Let us now consider the general case. Since the function w is nonnegative, nondecreasing and left continuous on E A, B], by assumption, there exists a finite Borel measure V on the real line such that V[(-oc, t ) ] = 0 ift<A ifA<tB .
=w(t)-w(A) =w(B)-w(A)
Suppose that g is continuous on [A, B]. Define
wg(t)
= v[{s:
0(s) _< t}] .
Then wg(t) is nondecreasing, nonnegative and left continuous. The following theorem is due to Riedel (1980a).
646
THEOREM 2.3. Let g be a continuous function on [A, B] and w be a nondecreasing, nonnegative and left continuous function on [A, B]. Define C = rain g(t),
A<t<_B
D = maxg(t) .
A<t<B
Then the integrals
Y=
g(t)dX(w(t))
and
Z=
tdX(wo(t))
exist in the sense of convergence in probability and they are identically distributed. Further more the characteristic function 0 of the random variable Y is given by log ~b(u) =
O[ug(t)]dw(t) =
O(ut)dwg(t)
where 0 is the logarithm of the characteristic function of X(1) - X ( 0 ) . Since the function ~b is infinitely divisible, it is nonvanishing and the logarithm taken here is the continuous branch of the logarithm with log 0(0) = 0. The next result gives a representation for the joint characteristic function of two stochastic integrals. We consider the case when w(t) - t. THEOREM 2.4. Let {X(t),t 2 0} be a homogeneous process with independent increments continuous in probability. Suppose that the process has finite mean function and finite covariance function both of bounded variation on a finite closed interval [tt, B]. Further suppose that g and h are real-valued and continuous functions defined on [A, B]. Define Y=
g(t)dX(t)
and
Z=
h(t)dX(t)
and denote the characteristic functions of X(t + h ) - X ( t ) and (Y, Z) by 0(.; h) and ~b(., .) respectively. Then O(u, v) is different from zero for all u and v and log q~(u, v) =
O[ug(t) + vh(t)]dt
where ~t(u) = log O(u; 1) and the logarithm taken here is the continuous branch of the logarithm of O(u; 1) with log 0(0; 1) = 0. For proof of the above theorem, see Lukacs (1975). This theorem continues to hold if the integrals Y and Z exist in probability. Wang (1975) studied sufficient conditions for the existence of double stochastic integrals of the form
lA
BiA B g(S, t)dX (s)dX (t)
Characterization and identifiability for stochastic processes
647
in the sense of convergence in quadratic mean. We discuss his results briefly. For i = 1,2, let Di : A = ti,o < ti, l < "'" < ti,ni = B be a subdivision of [A, B] and define
n1 n2
S(nl, tl2) = Z ~ g(t~,Jl' t~,j2)X(Atl,jI' At2,J2) jl=l j2=l

where
ti,j,-1 <~ti*,j , <_ti,j, and

2
i--1 Suppose that nl --+ oc and n2 ---+oc such that 2

max
x(At j,, At2,,_)
-- II[x(tiji)
U(tij. - tij~-l) --+ 0
and S(nj, n2) converges in probability (or in quadratic mean) to a limiting random variable S independent of the sequence of subdivisions Di, i ~ 1,2 and the intermediate points {t~,ji}. Then the limit is called a double stochastic integral and it is denoted by fAAB f A t ~ 9 ( t l , t 2 ) d X ( t l ) d X ( t 2 ) .
The integral is said to exist in probability or in quadratic mean depending on the type of convergence to S. The following theorem is due to Wang (1975). THEOREM 2.5. Suppose 9 is continuous on [A, B] x [A, B] and the function 2(h,t2;s~,s2) =E[X(tOX(t2)X(sl)X(s2)] is of bounded variation on [A, B 1 x [A, B]. Then the stochastic integral
fA~ fABg(tl,t2)dX(tl)dX(t2)
exists in quadratic mean. In fact, ifE]X(t)]4 < ec, then the double stochastic integral defined above exists in the quadratic mean if and only if the Riemann-Stieltjes integral
exists. Suppose that {R(t), t E T} is a stochastic process with continuous sample paths and independent of the process {X(t), t c T}. One can define stochastic integrals of the form
648
S=
fA
R(t)dX(t)
in the sense of convergence in probability through approximating sums as given above. It can be shown that the characteristic function of S when it is well defined is given by E[e ius] =E[exp{~B~(uR(t))dt}] for real u where O is the logarithm of the characteristic function of X(1) - X ( 0 ) [cf. Prakasa Rao and Ramachandran (1983)].
2.2. Characterization of stochastic processes determined up to shift

We say that a stochastic integral S determines a homogeneous process with independent increments {X(t), t > O} if the characteristic function of S determines the characteristic function of X(1). We assumed that X(O) = 0 through out this section. We now present a couple of results. THEOREM 2.6. (Prakasa Rao) Let {X(t), t _> 0} be a homogeneous process with independent increments continuous in probability. Suppose that the process has moments of all orders and its mean function and finite covariance function are of bounded variation in any finite closed interval [A, B]. Further suppose that 9 and h are real-valued and continuous functions defined on [A, B] and [C,D] respectively where A < C 2
~
Let
B[h(t)]k dt O, k >_2 .
Y=
/B
9(t)dX(t)
and
Z=
h(t)dX(t) .
Then the joint distribution of (Y, Z) determines the process {X(t), t _> 0} except possibly for the change of location provided the characteristic function of X(1) is entire. In such a case, either
j/'ABg(t)dt = fcDh(t)dt = 0
or there is no change in location.
649
For the proof of this theorem, see Prakasa Rao (1975a) [cf. Prakasa Rao (1992), p. 95]. The conditions that the process has moments of all orders and that the characteristic function of X(1) is entire are too strong. Riedel (1980b) has weakened these conditions. We now discuss his results. Let g be a real-valued continuous function on the interval [A, B] and w be a nonnegative, nondecreasing right continuous function on [A, B]. For Re(z) _> 0, define
S(z) =
~AA B Ig(t)l z
dw(t)
and
~(z) =
j[AB Ig(t)l z lg(t)dw(t) .
THEOREM 2.7. (Riedel) Suppose {X(t),t > 0} is a homogeneous process with independent increments continuous in probability with EIX(1)I ~ < ec for some 0 < )~ < 2. Then the stochastic integral
Y =
g(t)dX(w(t))
exists in the sense of convergence in probability and it determines the process {X(t), t _> 0} if and only if the following conditions are satisfied: (i) S(z) 0, 2 _< Re(z) < 2, (ii) S(z) 0, )o < Re(z) < 2, and
(iii) S(1) 0.
Under the condition that E]X(1)]2 < OO, the following result is true.
THEOREM 2.8. (Riedel) Suppose the process {X(t),t >_ 0} and the stochastic integral Y are as defined above with E]X(1)I 2 < oc. Then the stochastic integral Y determines the process {X(t), t > 0} if and only if
S(1) =
g(t)dw(t) 0 .
For proofs of the above results, see Riedel (1980b). These results make use of a representation of the characteristic function of a stochastic integral due to Riedel (1980a) [cf. Prakasa Rao (1992), p. 93].
2.3. Characterizations through identical distributions of two stochastic integrals

Let {X(t), t E T} be a homogeneous process with independent increments continuous in probability with X(0) = 0. The process is called a Wiener process if the increments X(t) - X ( s ) are normally distributed with variance proportional to it - s I. The process is called a stable process if the increments of the process have a
650
stable distribution [cf. Lukacs (1970b)]. It is said to be symmetric stable if the increments have a symmetric stable distribution.
Characterizations of Wiener process

THEOREM 2.9. (Laha and Lukacs) Let T = [A, B]. Suppose the process {X(t),t E T} has moments of all orders and let a and b be two continuous functions defined on [A, B l such that
A<t<_B
max ]a(t)l max Ib(t)l .

A<t<B
Let
r=
fB
a(t)dX(t)
and Z =
b(t)aX(t)
be two stochastic integrals defined as limits in quadratic mean. Then Y and Z are identically distributed if and only if (i) the process {X(t), t c T} is a Wiener . . . . . . B B process with hnear mean function, (11) either fA a(t)dt = fA b(t)dt or the mean . . . . . B 2 B funct,on is zero and (111) f j a (t)dt = fj b-(t)dt. The following theorem can be proved by relaxing the assumption on the existence of moments of the process {X(t), t E T}. THEOREM 2.10. (Laha and Lukacs) Let T = [A, B 1 and a be a continuous function but not constant on [A, B]. Let ~ 0 be real such that either (a) maxA<t<B[a(t)l < I~1 and (b) maxA<t<Bla(t)[ > lel and Let
r =
B-A B-A
> 1, or < 1 holds.
a(t)dX(t)
be defined in the sense of convergence in probability. Then {X(t), t E T} is a Wiener process with linear mean function if and only if (i) Y is identically distributed as e[X(t + 1) - X ( t ) ] , (ii) either fff a(t)dt = ~ or the mean function is zero, and (iii) fff a2(t)dt= ~2. Proofs of Theorems 2.9 and 2.10 are given in Lukacs (1975). For other characterizations of the Wiener process through identically distributed stochastic integrals, see Lukacs (1975), Ch. 7. Ramachandran and Rao (1970) [cf. Kagan et al. (1973), Ch. 13] characterized the Wiener process under slightly different conditions. For related results, see Laha and Lukacs (1965, 1968). THEOREM 2.11. (Ramachandran and Rao) Let T = [A, B]. Let w be a nonconstant nondecreasing right continuous function defined on a compact interval
651
[a, b] with w(a) -- A and w(b) = B. Let 9 be continuous on [a, b] such that either (i) 19(t)l < 1 for all t in [a, b] and 9 has a finite number of zeroes on [a, b] or (ii) ]9(t)l _> 1 for all t in [a, b]. Suppose that
r =
g(t)dX(w(t))
(defined in the sense of convergence in probability) has the same distribution as the sum of n independent random variables each distributed as X ( t + ( 1 / n ) ) - X ( t ) , A < t _ ( 1 / ( B - A ) ) . Then the process {X(t), t E [A, B]} is a Wiener process with linear mean function if and only if
b92(t)dw(t) = 1 .
Further, in that case,
fa
b 9(t)dw(t) = 1
or the mean function is zero. THEOREM 2.12. (Ramachandran and Rao) Let T = [A, B] and w be as defined in the Theorem 2.11. and g and h be continuous functions on [a, b] such that max Ig(t)] max Ih(t)[ in [a, hi. Suppose the process {X(t), t E T} has moments of all orders. Let
~b
r = jo g(t)dX(w(t))
and
z =
fa
h(t)dX(w(t))
be defined as limits in quadratic mean. Then Y and Z are identically distributed if and only if (i) the process {X(t), t c T} is a Wiener process with linear mean function, (ii) f~ 9(t)dw(t) = f~ h(t)dw(t) or the mean function is zero, and (iii) f~ g2(t)dw(t) = J~ h2(t)dw(t). Riedel (1980b) obtained the following characterization for the Wiener process as a special case of his general results for stable processes which we will discuss in the sequel. THEOREM 2.13. (Riedel) Let T = [0, oc). Let bj be continuous on [Aj, Bjl and wj be nondecreasing, nonnegative and left continuous on [Aj,Bj] c T, j = 1,2. Suppose that E[X(1)] 2 < oc. Then
652 J~A BIb~(t)dX(wl(t))

1
B.L.S.
Prakasa Rao
and
j~A B1b2(t)dX(w2(t))
1
+q
are identically distributed for some real q if and only if the process {X(t), t _> 0} is a Wiener process with linear mean function. Let bj and wj,j = 1,2 be as defined above. For Re(z) _> 0, define
S(z) =
and
/?
1
Ibl(t)lZdwl(t) -
/?
[bz(t)lZdw2(t)
S(z) =
/?
1
]bl(t)]Z-lbl(t)dwl(t) -
f?
2
]b2(t)]Z-lb2(t)dw2(t)
where [Aj, Bi] C T, j = 1,2. Then S(z) and S(z) are analytic in Re(z) > 0 and continuous in Re(z) _> 0. THEOREM 2.14. (Riedel) Define S(z) and S(z) as given above. Suppose that z = 0 is not an accumulation point of the zeroes of S(-)S(-) and lira sup xlog IS(x)S(x)t = 0
x~0+
where z = x + iT. Then the properties (i) that the process {X(t), t E T} is a Wiener process with linear mean function m and that (ii)
//
are identically distributed are equivalent if and only if (a) S(2) - 0, (b) S(z) 0 for 0 < Re(z) < 2, Ira(z) = 0, and (c) S(1) = 0 or m(t) = O. We will describe other characterizations for the Wiener process later in this section.
Characterizations of stable processes

We will now discuss some results leading to characterizations of stable processes through identically distributed stochastic integrals. The following theorem is due to Lukacs. THEOREM 2.15. (Lukacs) Let T = [0, oo) and { X ( t ) , t C T} be homogeneous process with independent increments and continuous in probability. Suppose the increments of the process have a symmetric distribution and X(0) = 0. Then the
653
process {X(t), t c T} is a symmetric stable process if and only if there exists a function t(.) such that (i) t(y) > 0 for y > 0, and (ii) the stochastic integral
Y(y - t)dX(t)
has the same distribution as the random variable X(t(y)) for each y > 0. The above theorem has been generalised to stable processes in general in Lukacs (1969) and the result is as follows.
THEOREM 2.16. (Lukacs) Let T = [0, oc) and {X(t),tC T} be homogeneous process with independent increments and continuous in probability. Suppose the distribution of X(t) is nondegenerate for every t > 0 and X(0) = 0. Then the process is a stable process if and only if there exists two functions t(.) and s(.) such that (i) t(y) > 0 for all y > 0 and (ii) the stochastic integral
oY(y - t)dX(t)
has the same distribution as that ofX(t(y)) + s(y) for all y > 0. The following theorem, due to Riedel (1980b), is an analogue of Theorem 2.13 for stable processes. THEOREM 2.17. (Riedel) Let T = [0, ec). Define S(z) and S(z) as given above. Suppose that z = 0 is not an accumulation point of zeroes of S(-)S(-) and lim sup x log
x~0+
ISCx) (x)l
where z = x + iy. Then the properties (i) that the process {X(t), t E T} is a stable process with exponent ~ and that (ii) the stochastic integrals
BI bl ( t ) d X ( w l (t)) 1
and
1
b2(t)dX(w2(t)) + q
are identically distributed for some real q are equivalent if and only if (a) there exists a unique real zero c~ of S(-), 0 < c~ < 2; in case c~ < 2, its multiplicity is not more than 2; (b) S(c~)(2 - ~) = 0; and (c) if c~ < 2, then S(z)S(z) 0 for Re(z) = ~, z ~. Let {X(t), T >_ 0} be a homogeneous process with independent increments continuous in probability and {R(t),A < t < B) be another stochastic process with continuous sample paths independent of the {X(t), T _> 0}. One can define stochastic integrals of the form
S =
fA
R(t)dX(t)
654
in the sense of convergence in probability following the techniques discussed earlier in this section using the Riemann-Stieltjes approximating sums. The characteristic function of S is given by
E[eiuS]= E { exp(fABlOg[ O(uR(t))]dt) }

[cf. Prakasa Rao and Ramachandran (1983)]. If a process {X(t), 0 < t < 1} is a symmetric stable process with exponent c~ and independent of a process {R(t), 0 < t < 1} with nonnegative continuous sample paths, then
S = f0 1R(t)dX(t)
exists in probability and
E[ei~Sl= E{exp(-21u]~foolR(t)~dt) }
for some 2. In particular E[e i"s] = exp(-2lu] ~) = provided
E[eiu{X(l)-x(o)}]
(2.1)
foo1R(t)~dt=
1 a.s.
Hence the distribution of S does not depend on the process R if (2.1) holds and it is the same as that of X(1) - X ( 0 ) . We now give a characterization of a symmetric stable process based on this observation. THEOREM 2.18. (Prakasa Rao and Ramachandran) Let {X(t), 0 < t < I} be a homogeneous process with independent symmetric increments continuous in probability and let R be a nonnegative continuous strictly increasing function on [0, 1] such that there exists a unique ~ such that
f0
1R(t) dt
~
for some 0 < ~ _< 2. Further assume that
olR(t)dX(t)
and
X(1)-X(0)
are identically distributed. Then the process {X(t), 0 < t < 1} is a symmetric stable process with exponent cc For a proof of this theorem, see Prakasa Rao and Ramachandran (1983). As a special case, one can take the function = (2t) 1/~. Variations of these results
R(t)
655
and extension of the results to stable and semistable processes are discussed in Ramachandran and Prakasa Rao (1984), Ramachandran and Lau (1991), Ch. 6, and more recently in Ramachandran (1994, 1997).
Characterization of a Wiener process taking values in a Hilbert space

As far as the author is aware of there is very little work on characterization results for stochastic processes taking values in a Hilbert space. We now discuss few results in this direction. The following result is due to Prakasa Rao (1971). Let A be the interval [0, 1] and ~ be the Borel a-algebra of subsets of [0, 1]. For each A ~ ~ , let qS(A) be a random element taking values in a real seperable Hilbert space H. Suppose that ~b(A) satisfies the following properties: (i) if A and A' are disjoint Borel sets of [0, 1], then the random elements ~b(A) and q~(A') are independent and ~b(A U A') = ~b(A) + qS(A') (ii) the process {q~(A), A E ~ } has stationary increments, that is, qS(A) and 4(A') are identically distributed if A and A' have the same Lebesgue measure and (iii) if #t denotes the probability measure of qS([0, t]), then #t converges weakly to the distribution degenerate at the origin as t--+ 0. The process {~b(A), A E ~ } on A with the properties (i), (ii) and (iii) as stated above is said to be a homogeneous process with independent increments. Such a process is said to be a Wiener process with mean zero if the characteristic functional/~t(Y) of 4([0, t]) has the representation
~t(Y) = e x p { - ( 1 / Z )t(Sy, y) }
where S is an S-operator. For the definition of S-operator and the study of probability measures on a Hilbert space, see Parthasarathy (1967). Let ~b be a homogeneous process with independent increments on A with mean zero and with E~[IX[[2 < oc where # is the probability measure o f X = qS([0, 1]). Let S denote the S-operator associated with ~b. For any bounded linear operator A, define
n(A) = ITr(ASA')] 1/2 + [Tr(A'SA)I 1/2 .

Then the set {A : n(A) = 0} is a linear semigroup in the linear group of all bounded linear operators A. The function n is a norm in the corresponding factor group. Let us not distinguish between a coset and the individual operator in the coset. In this sense, n is a norm in the linear set of all bounded linear operators. Let d s denote the completion of this set in the norm n. Consider the space L2 = L z ( A , N , m , sds) of functions A(.) with values in sd~ which are strongly measurable and such that [A[2 = fAn2(A(.))dm < oc where m is the Lebesgue measure on A. Vakhania and Kandelski (1967) defined stochastic integrals of the form
656
B. L. S. PrakasaRao
J = fAA(2)0(d2)
for functions A(.) in L2. Under this framework, the following characterization theorem for a Wiener process taking values in a real seperable Hilbert space was proved in Prakasa Rao (1971). THEOREM 2.19. (Prakasa Rao) Suppose ~b is a homogeneous process with independent increments on A with mean zero and finite associated S-operator S. Let A(.) and B(.) be functions in L2 satisfying the following properties: (i) a = supo IF/(~)II < oc and b = sup~ IIB(,~)II < oc; (ii) H A = H~ = H for all 2 E A where H~ denotes the subspace spanned by the operator A(2); and (iii)
fA{ IIA(Z)xll 2 - IIB(~)xll2} d'~ is either strictly greater than zero or strictly less than zero for all x ~ H - {0}.
Then
and
fA
A(2)~b(d2)
j/A
B(2)~b(d2)
are identically distributed if and only if q) is a Wiener process and A(-) and B(.) satisfy the relation
fA A(2)SA'(2)d)L = fA B(2)SB'(2)d2.
Similar results characterizing the Wiener process taking values in a Hilbert space were obtained by Kannan (1972a, b) following the operator-valued stochastic integrals developed in Kannan and Bharucha-Reid (1971).
2.4. Characterizations through independence of stochastic integrals Let {X(t),t E T} be a homogeneous process with independent increments
continuous in probability. The following theorem is a variation of a result due to Skitovich (1956) giving a characterization of the Wiener process through independence of stochastic integrals. THEOREM 2.20. (Skitovich) Let T = [A, B]. Suppose that a and b are continuous functions defined on [.4, B] which are not identically zero in [A, B] such that for each t either a(t) or b(t) does not vanish in [A, B]. Let
Y=
a(t)dX(t)
and
Z=
b(t)dX(t)
657
be stochastic integrals defined in the sense of convergence in probability. Then the process {X(t), t c T} is a Wiener process with a linear mean function if and only if (i) Y and Z are stochastically independent, and (ii) f~ a(t)b(t)dt = O. For a proof of this theorem, see Lukacs (1975). A generalization of this result is given in Ramachandran and Rao (1970) [cf. Kagan, Linnik and Rao (1973), Ch. 13]. We now state their result. THEOREM 2.21. (Ramachandran and Rao) Let T = [A, B 1. Let w be a nonconstant nondecreasing right continuous function defined on a compact interval [a, b] with w(a) = A and w(b) = B. Let g and h be continuous functions defined on [a, b] with atleast one of them nonvanishing and the other nonvanishing on a set of positive w-measure (the measure induced by w(.) on [a, b]). Then the stochastic integrals
Y=
g(t)dX(w(t))
and
Z=
h(t)dX(w(t))
exist in the sense of convergence in probability and Y and Z are stochastically independent if and only if (i) the process {X(t), t E T} is a Wiener process with a linear mean function, and (ii) J~ g(t)h(t)dw(t) = 0 unless the process {X(t), t E T} is degenerate.
2.5. Characterization based on the conditional distribution of a stochastic integral on another stochastic integral
We now discuss some results for characterizing Wiener process and stable processes either through regression of one stochastic integral on another or through the symmetry of the conditional distribution of one stochastic integral on another.
Characterization of a Wiener process THEOREM 2.22. (Laha and Lukacs) Let T = [A, B]. Suppose that a process {X(t), t E T} is homogeneous process with independent increments continuous in probability and E[X(t)] 2 < oc for all t. Further suppose that its mean function and covariance function are of bounded variation in [A, B]. Suppose that a and b are continuous functions defined on [A, B] such that a(t)b(t) 0 for t c [A1, B1] where A _< A1 < B1 < B and a(t) is not proportional to b(t). Let Y= a(t)dX(t)
and Z=
b(t)dX(t)
be two stochastic integrals defined as limits in quadratic mean. Then the process {X(t), t E T} is a Wiener process with linear mean function if and only if Y has
658
linear regression and constant scatter on Z, that is, the regression of Y on Z is linear and homoscedastic. For a proof of this theorem, see Lukacs (1975). Lukacs (1977) investigated the stability of this characterization property for a Wiener process. A slight variation of this result is due to Ramachandran and Rao (1970). The next result gives another characterization of Wiener process through regression. THEOREM 2.23. (Prakasa Rao) Let T = [A, B]. Suppose that a process {X(t), t E T} is homogeneous process with independent increments continuous in probability. Suppose the process has moments of all orders, its mean function and covariance function are of bounded variation in IA, B], and the increments of the process have nondegenerate distributions. Let a and b be continuous functions defined on [A, B] with the property that
fA
implies
R a(t)b(t)dt : 0
ea(t){b(t)] k dt
for all k > 1. Let

Y=
a(t)dX(t)
and
Z=
b(t)dX(t)
be stochastic integrals defined in quadratic mean. Then Y has constant regression on Z, that is E ( Y I Z ) = E ( Y ) a.s. if and only if the process {X(t), t E T} is a Wiener process with a linear mean function and
AB a ( t ) b ( t ) d t = 0 .
For proof of this result, see Prakasa Rao (1970). The next result gives a characterization of the Wiener process based on the symmetry of the conditional distribution of one stochastic integral on another stochastic integral. THEOREM 2.24. (Prakasa Rao) Let T = IA, B]. Suppose that a process {X(t), t C T} is a homogeneous process with independent increments continuous in probability. Suppose the increments of the process have nondegenerate distributions. Let a and b be continuous functions defined on [A, B] with the property that dt~O and <oc .
JA I g(t) I
659
Let
Y= a(t)dX(t)
and
Z=
b(t)dX(t)
be stochastic integrals defined in the sense of convergence in probability. Then the conditional distribution of Y given Z is symmetric if and only if the process {X(t), t E T} is a Wiener process with a linear mean function re(t) = 2t and a(t) and b(t) satisfy the relation
2 a(t)dt = 0
and
a(t)b(t)dt
0 .
P r o o f of this theorem is given in Prakasa Rao (1972). Wang (1975) gave the following characterization of the Wiener process based on the regression properties of a double stochastic integral on another stochastic integral. Let 9(., .) be a continuous function defined on [A, B] x IA, B] such that
g ( t , s ) d t ds = O
and
g(t,t)dt = l .
Define Y1 = (B - A ) -1 and
I12 =
fA
X(dt) = ( B - A ) - I ( X ( B ) - X ( A ) )
fjfB
g(t,s)X(dt)X(ds)
THEOREM 2.25. (Wang) Let T = [A, B]. Suppose that a process {X(t), t E T} is
homogeneous process with independent increments continuous in probability. Then

E(Y2IY1) =/~
a.s.
for some real/3 if and only if the process {X(t), t E T} is a Wiener process with linear mean function. For a p r o o f of this result, see Wang (1975). For related results, see Wang (1974).
Characterization o f stable processes The following result gives a characterization of a symmetric stable process with the exponent ~/> 1.
THEOREM 2.26. (Prakasa Rao) Let T = [0, 1]. Suppose that a process {X(t), t E T} is a homogeneous process with independent increments continuous
660
in probability with X(0) = 0 and E[X(t)I = 0 for all t. Further suppose that the increments of the process have nondegenerate symmetric distributions. Let
~5. =
/01
? dX(t)
for ;~ > 0. Then Y~ exists in the sense of convergence in probability and the process {X(t), t E T} is a symmetric stable process with the exponent 7 > 1 if and only if for some positive real numbers 2 and #, 2 p,
E(Y~IY~) = pz~
a.s.
for some real fl depending on 2 and #. Further more 7, 2, # and/~ are connected by the equation
# 7 + 1 = f i ( 2 - p + #7 + l) .
P r o o f of this theorem is given in Prakasa Rao (1968).
2.6. Characterization of Poisson, gamma and negative binomial processes through stochastic integrals
Let g~ (., .) be a continuous function defined on [A, B] x [A, B l such that
egj(t,s)dtds=
Define
and
gl(t,t)dt = 1 .
I11 = (B - A) -1 ~AA B X(dt) = (B - A) -1 (X(B) - X ( A ) ) and

21 =
fBfB
gl(t,s)X(dt)X(ds)
THEOREM 2.27. (Wang) Let T = [A, B]. Suppose that a process {X(t), t E T} is
homogeneous process with independent increments continuous in probability. Then
E(z~IY~) =~Y~ + p
a.s.
for some real constants ~ 0 and fi if and only if
x(t) = ~ Y(t) + (~/~)t

where Y is a Poisson process. A variation on the conditions on the function g~ leads to characterizations of g a m m a and negative binomial processes. We now briefly state these results. For
66 t
more details and proofs of this theorem and the theorem stated above, see Wang (1975). A homogeneous process X(t) with independent increments is said to be a gamma process if the increments X(t + h) - X(t) have a gamma distribution with parameters ah(~ > 0) and fl > 0 for all h _> 0 and it is said to be a negative binomial process if the increments have a negative binomial distribution with parameters rh(r _> 0) and p(0 0. Let 92(t,s) be a continuous function defined on [A, B] x [A, B] such that
92(t,s)dtds=c~=O
Define Y1 and ( B - A ) -1
and
g2(t,t)dt= l .
fA
X(dt) = ( B - A ) - I ( x ( B )
-X(A))
Z2 =
f][
g2(t,s)X(dt)X(ds) .
THEOREM 2.28. (Wang) Let T = [A, B]. Suppose that a process {X(t), t E T} is homogeneous process with independent increments continuous in probability. Then
E(Zzlrl) = c~rl a.s.

for some real constant c~if and only if (i) e = 0, and X or - X is a gamma process, or (ii) c~; 0, c < 0, and X(t) = ~ Y(t), where Y is a negative binomial process.
2.7. Characterization by the absolute moments of stochastic integrals

We now give a condition in terms of the absolute moments for two stochastic processes which are homogeneous with independent increments cuntinuous in probability to be identical. THEOREM 2.29. (Prakasa Rao) Suppose X = {X(t), 0 < t < T} and Y ~_ {Y(t), 0 < t < T} are two stochastic processes with homogeneous independent symmetric increments continuous in probability and suppose that X(0) = Y(0) = 0 a.s. Further assume that there exists p > 0, p ~ 2m, m = 1 , 2 , . . . such that for some continuous function 2(t) on [0, t], E f0r2(t)dX(t) ; 7U p= E for2(t)dY(t) + 7V p
662
for all 7 real where U and V are nonzero symmetric and identically distributed random variables with absolute moments of order p, U independent of X and V independent of Y with (X, U) independent of (Y, V). Suppose that the characteristic functions of Y(t + 1) - X ( t ) and Y(t + 1) - Y(t) have power series expansions. Then the processes X and Y are identically distributed provided
fo
r{2(t)]kdt O, k >_2 .
For the proof, see Prakasa Rao (1998).
2.8. Identifiability for linear processes

Let {X(t), - c ~ < t < ~ } be a homogeneous process with independent increments and f be a function such that Ifl and f 2 are integrable. It is known that the stochastic integral
Af(t) =
F f(tO(3
u)X(du),
--CXD < t < cx~
exists as a limit in the sense of convergence in quadratic mean [cf. Doob (1953)] ifE(X(t)) 2 < ~ . In such a case, the process A f ( t ) , - ~ < t < ~ is called a linear process. This process is a stationary process and the characteristic functional of the process is defined by
OA:,(~) = E{exp[i f ~ Af(t)~(dt)l }

where 4() runs through real-valued signed totally finite measures on the a-algebra of Bore1 subsets of the real line [cf. Bartlett (1966)]. It can be shown that ~bAl.(~) = e x p { f ~ where ~ is defined by
~b(/~f(t-u)~(dt))du}
E(exp{iO[X(t) - X(0)]}) = exp{t~b(0)} .

Let 92 denote the class of all real-valued functions f such that Ifl and f2 are integrable. The following results are due to Weiss and Westcott (1976). THEOREM 2.30. The function ~, is uniquely determined given If two linear processes
Af(.) for all f G B2.
Ai(t) =
F A(t O0
u)X~(du),
i = 1,2
have the same characteristic functional, then either f 2 ( t ) - c f:(t + a) for some constants c and a or the processes Xi(t), i = 1,2 are gaussian processes.
663
For proof of this result, see Weiss and Westcott (1976). A stationary stochastic process { X ( t ) , - o c < t < oc} is said to be timereversible if for all n and h , . . . ,tn, the random vectors ( X ( h ) , . . . ,X(tn)) and ( X ( - t l ) , . . . , X ( - t n ) ) have the same joint distribution. The following result is a consequence of the above theorem. THEOREM 2.31. Let Af(t) be a linear process as defined earlier. Suppose there does not exist a constant a such that f ( t ) = f ( a - t) for all t or f ( t ) = - f ( a - t) for all t and that X(t) has a symmetric distribution for all t. If the process Af(t) is time-reversible, then the process { X ( t ) , - e c < t < oc} is a gaussian process.
2.9. Remarks
Most of the results in this section on characterizations of stochastic processes through stochastic integrals for processes with independent increments are extensions of results on characterizations for independent random variables. For lack of space, we are not indicating the exact nature of correspondence. For instance, the result due to Wang (1975) parallels that of Laha and Lukacs (1960), the result in Prakasa Rao (1998) is based on Braverman (1987) etc. 3. Martingales and conditional structure characterizations
3.1. Characterization of a Wiener process

The following result characterizing a Wiener process is due to Levy (1948) [cf. Doob (1953)]. THEOREM 3.1. (Levy) Let X = {Xt, t >_ 0} be a real-valued square integrable martingale adapted to an increasing family of a-algebras { ~ t , t >_ 0}. If the process X has continuous sample paths a.s., and the processes {Xt, ~ , t >_ 0} and {X2 - t, Y , t _> 0} are martingales, then the process X is a Wiener process. The following theorem characterizes stochastic processes whose finite dimensional distributions coincide with those of a Wiener process. Recall that it is possible for two stochastic processes to have the same finite dimensional distributions and yet the processes may be different in the sense that the probability measures generated by them are not identical. THEOREM 3.2. (Wesolowski) Let X = {Xt, t >_ 0} be a real-valued square integrable martingale adapted to an increasing family of a-algebras { ~ t , t >_ 0}. Let {-~t, t _> 0} be a decreasing family of a-algebras and suppose that X is adapted also to this family of a-algebras. If the process {Xt,~,~t,_>0} and {X2-t,~t,>O} are martingales and the processes {Xtt-l,~t,>_0} and { ( X 2 - t ) t - 2 , Nt,> 0} are reverse martingales, then the stochastic process X = {Xt, t _> 0} has the same finite dimensional distributions as those of a Wiener process on [0, ec).
664
Recall that a process {Z, ft, _> 0} adapted to a decreasing family of a-algebras {fft, t _> 0} is said to be a reverse martingale if E[Z~]Nt] = Z t a.s. for 0 < s < t. REMARKS. Note that the condition on the continuity of sample paths of the process is not present in the hypothesis of the above theorem. Wise (1992) constructed an example to show that there are stochastic processes satisfying the conditions stated in the above theorem and yet the process is not a Wiener process. We now present this example. Let the function g(x) = x if x is irrational and 9(x) = 0 if x is rational. Let {W(t), t _> 0} be the standard Wiener process and define X ( t ) = 9 ( W ( t ) ) . Then the process {Xt, t _> 0} is not a Wiener process since its sample paths are not continuous almost surely. But it has the same finite dimensional distributions as those of the Wiener process and it satisfies all the conditions stated in the above theorem [cf. Wise (1992)]. The following result again characterizes stochastic processes with the same finite dimensional distributions as those of a Wiener process through conditional structures. This result is due to Wesolowski (1988). THEOREM 3.3. (Wesolowski) Let {Xt, t _> 0} be a square integrable process such that for a n y 0 < r l _ < r 2 < - - - < r n _ < r < s < t , n _ > 1, (i) (ii) (iii) (iv) (V)
E(Xr) = r,
Cov(Xr,X~) = r,
E(Xs[Xr~,... ,Xr,,,Xr) = o~IXr -~- ill, E(X~]X~I,. . . ,X~,, ,X~,Xt) = e2Xt + fi2X~ + rl, and
V a r ( X s I X r l , . . . ,Xrn,Xr,Xt) = (~,
where ~1,/~1, c~2,/~2,6, and t/are some constants depending on r, s, r l , . . . , rn, and t only. Then the process {Xt, t _> 0} has the same finite dimensional distributions as those of a Wiener process on [0, ec). However, the following result due to Wesolowski (1990a) characterizes a Wiener process through martingale structures. THEOREM 3.4. (Wesolowski) Let X = {Xt, t > 0} be a square integrable process such that (i) {Xt, ~-t, t _> 0}, (ii) {Xt2 - t, Yt, t _> 0}, (iii) {X) - 3tXt, ~ t , t _> 0} and (iv) {Xt4 - 6 t X t 2 + 3 f l , ~ t , t >_ 0} are martingales. Then X = {Xt, t >_ 0} is a Wiener process. This result follows from the observation that the properties (i)-(iii) imply that
E(x, -xs) 4 = 3(t - s) 2
which proves that the process {Xt, t _> 0} has a version with continuous sample paths and the result follows from Levy's theorem.
Characterization and identifiability for stochastic processes 3.2. Characterization o f a Poisson process
665
The following result due to Watanabe (1964) characterizes a Poisson process through martingale structures. THEOREM 3.5. (Watanabe) Let X = {Xt, t _> 0} be a real-valued stochastic process defined on a probability space (f2, ~ , P ) adapted to an increasing family of a-algebras {Yt, t _> 0}. Let Yt = Xt - t. If (a) {Yt, ~ t , t > 0} is a purely discontinuous martingale with jumps equal to +1 and {Yt2 - t , ~ t , t > 0} is also a martingale, or (b) X is a purely discontinuous process with jumps equal to + 1 and {Yt,~t,t _> 0} is a martingale, then the process X = {Xt, t > 0} is a Poisson process. Related martingale characterizations were obtained by for the doubly stochastic Poisson process in Bremaud (1981) and for spatial Poisson processes in Ivanoff (1985) and Merzbach and Nualart (1986). Let X = {Xt, t _> 0} be a real-valued martingale with respect to an increasing family of a-algebras {~t,t_> 0} and further suppose that the process {Xt + 2t, t > 0} has purely discontinuous sample paths with unit jumps. Then it is well known that the process X = {Xt, t _> 0} is a Poisson process with intensity rate 2 [cf. Meyer (1976)]. Wesolowski (1990a) obtained another martingale characterization of a Poisson process. THEOREM 3.6. (Wesolowski) Let X - {Xt, t _> 0} be adapted to a nondecreasing family of a-algebras {~-t, t _> 0}. Let Yt = Xt - t. Suppose that {Yt, ~ t , t _> 0}, {Yt2 - t, ~ t , t > 0} and {Yt3 - 3tYt - t, ~ t , t _> 0} are martingales. If X is a nondecreasing process, then it is a Poisson process. As a special case of the above theorem, the following result holds characterizing a Poisson process in the class of processes with independent increments. THEOREM 3.7. (Wesolowski) Let X = {At, t _> 0} be a nondecreasing process with independent increments such that
EXt = t, EXZ = t2 + t, EX~ t3 + 3t2 + t, t > O .
Then X is a Poisson process. In a recent paper, Wesolowski (1997) proved the following result generalizing the above theorem using a Poisson central limit theorem for row-wise triangular arrays due to Beska, Klopotowski and Slominski (1982). THEOREM 3.8. (Wesolowski) Suppose that X = {Xt, t >_ 0} is a nondecreasing process. Let Yt = X t - t. Suppose that {Yt, t > 0} is a martingale. If E(Yt2) = E(Yt3) = t for all t >_ 0, then {Art, t _> 0} is a Poisson process.
666
3.3. Characterization of a point process

The following theorem due to Bremaud (1975) characterizes a class of point processes through local martingales [for the definition of a local martingale, see Elliott (1982)]. THEORZM 3.9. (Bremaud) Let {Nt, t > 0} be a counting process on a probability space (O, ~ , P) adapted to a nondecreasing family of a-algebras { ~ t } contained in ~-. Let {A(t), t _> 0} be a nonrandom nonnegative right continuous nondecreasing function such that A(0) = 0 and A(t) - A ( t - ) _< 1 for all t _> 0. If {Nt - A ( t ) , t >_ 0} is an {Wt}-local martingale, then, for all 0 < s < t, (i) Nt - N s is independent of ~ s , and (ii)
1
E[exp{iu(Nt - Ns)}] =
II
(eiUAA(v)+ 1 - AA(v)) /
~<v<_t,~(v)o
x exp((e i~ - 1)(At(t) - AC(s)))
where AC(t) is the continuous component of A(t) and AA(t) = A ( t ) - A(t-).
3.4. Martingale structures of Wiener and Poisson processes

We now briefly discuss some results on martingale transformations of a Wiener and Poisson processes due to Wesolowski (1997a). Let {Wt, t _> 0} be a Wiener process defined on a probability space (f2, ~ , P ) adapted to a nondecreasing family of a-algebras {Yt, t _> 0}. It is known that {t~/2Hn(Wt/vri), t >_ 0} where Hn is the nth Hermite polynomial, is a martingale. For instance, for n = 1, we have the process { Wt, t _> 0} and for n = 2, we have the process {Wt2 - t, t _> 0}. As was pointed out earlier, a result due to Levy (1948) states that these properties characterize the Wiener process in the class of processes with continuous trajectories. Another example of a martingale constructed from the Wiener process is {exp(cWt - c2t/2), t >>0} for any c E R [cf. Revuz and Yor (1994)]. It can also be checked that h(Wt) = sin(cWt),t _> 0 is also a martingale [cf. Wesolowski (1997a)]. It is of interest to characterize the class of all functions g(t,x): [0, oc) x R---+ R such that {g(t, Wt),t > 0} is a martingale (cf. Theorem 3.10). Wesolowski (1997a) studied families of stochastic processes of the form (i) {c~(t)h(Wt),t >_ O}, (ii) {h(Wt)-a(t),t>_O} and of the form (iii) {c~(t)h(Wt/v/t),t >_ 0} which have the martingale property. Related results for Poisson processes have also been obtained in Wesolowski (1997a). Plucinska (1998) proved that if the condition
E\
<
0<s<t
holds for the process { Wt, t _> 0} and for some positive integer n and if additionally the function h is analytic in R, then h = cHn where H, is the nth Hermite
667
polynomial. This result gives a stochastic characterization of the Hermite polynomials and it can be interpreted as a characterization of martingales f r o m the class of t r a n s f o r m a t i o n s of the Wiener process of a general f r o m
0)
where the value of the process for t = 0 is defined to be zero. The following result is due to Wesolowski (1997a). THEOREM 3.10. (Wesolowski) Let L~ be the space of square integrable functions with weight function f(x) = e . If a stochastic process of the f o r m
, . X 2 / 2 -,
with value equal to zero for t = 0, is a martingale, where h E L} and ~(1) = 1, then ~(t) = t ~/a and h = cHn for some n = 1 , 2 , . . . and some c E R . REMARKS. C a b a n a (1990) extended the Levy characterization of a Wiener process t h r o u g h martingales to characterize a Brownian sheet process.
3.5. Characterization of a d-dimensional Wiener process

Let (f2, .~-,P) be a probability space and (W(t), ~ t , t _> 0) be a d-dimensional Brownian m o t i o n (Wiener process) defined on (~2,~7,P). F o r any u C R a, define
eu(x) = e i(u'x)
where (., .) denotes the inner p r o d u c t in R a. Then, for any 0 < tl < t2 and u E R a,
E[eu(W(t2))]~h] = eu(W(q)) e x p [ - ( 1 / 2 ) ( t 2 - tl)ltull 2] .

This relation implies that
(3.1)
{Jiu(t), .~t, t >_ 0}

is a martingale on the probability space (f2, .Y, P) for every u E R a where
(3.2)
Jiu(t) = exp[i(u, W(t)) + (1/2)llul] 2] .

The relation (3.1) also implies that
(3.3)
dE[e,(W(t))]~t,] =
for t > tl. Therefore, for t2 > tl,
exp[-(1/2)(t -
tl)tlU]] 2]
668
B. L. S. PrakasaRao
EEeu(W(t2) ) = -(1/23
eu(W(t~))lY~,]
Ilulff'= eu(W(tl)) exp[-(1/2)(t - tl)llull2]dt

)lYt~]dt
-(1/2) lluf .f, t2EEe.(W(tD
=E[~t2(1/2)Ae,(W(t))dt'Yt~J
where the operator A stands for the Laplacian operator d 02
In other words,
{eu(W(t))- fot(1/2)Aeu(W(x))ds,@t,t>_O}
(3.4)
is a martingale on the probability space (f2, ~ , P ) for all u E R ~. Let C~(R ~) be the space of all bounded continuous functions having compact support and possessing continuous derivatives of all orders. Any function f E C~(R a) can be represented in the form f(x) = / ei(U,x)qS(u)du
for some rapidly decreasing function ~b. The observation made in (3.4) implies that {ZAt), g t , t >__O} is a martingale on the probability space ( f 2 , ~ , P ) where for every (3.5)
f EC~(R a)
(3.6)
Zf(t)
= f(W(t)) -
/0 ' (1/2)Af(W(s))ds .
It is easy to check that if {Zf(t), Yt, t >_ 0} is a martingale for every f E C~(Ra), then the corresponding process is a d-dimensional Wiener process. It is sufficient if the property holds for functions of the form f ( x ) = ei(u'x) for u E R a. We have the following theorem characterizing a Wiener process in R a [cf. Stroock and Varadhan (1979), p. 84]. THEOREM 3.11. Let (f2, Y, P) be a probability space, {-~t, t _> 0} be a nondecreasing family of a-algebras of ~- and W : [0, oc) x f2 --~ R d a right continuous, P-almost surely continuous ~t-adapted random process such that P{W(0) = 0} = 1. Then the following are equivalent.
Characterization and ident~fiability for stochastic" processes
669
(i) {W(t), J r , t _> 0} is a Wiener process on (O, Y , P ) ; (ii) {Zf(t), ~t, t >>_ 0} is a martingale on (f2, ~ , P ) for every f C~(Ra); and (iii) {Ji,(t),~t,t >_ 0} is a martingale on (~2,~,e) for every u E Rd. We will now discuss a more general result indicating the equivalence of some martingales following Stroock and Varadhan (1979), p. 85. Let (f2, ~-, P) be a probability space and {~-t, t _> 0} be a nondecreasing family of o--algebras of ~-. Let a : [0, oc) x O ~ Sd and b : [0, ec) x ~2 --+ R d be right continuous in t and ~t-adapted processes where Sd is the space of symmetric nonnegative definite real d x d matrices. Let aiJ(t, co) be the q-th element of the matrix a(t, m) and bY(t, co) be the j-th element of the vector b(t, co). For any function f E C2(Rd), the space of twice continuously differentiable functions, define Ltf for t > 0 by
(Lt(co)f)(x) = 2ij.~ . = 1a i J ( t ' c o ) ~
1 d
52f
(x) -- Z j=l bJ(t, co) --aXj af
Let ~(t, co) be a mapping from [0, co) x O into R d right continuous in t and adapted to the family {~-t, t _> 0}. The following theorem holds. For proof, see Stroock and Varadhan (1979), p. 86.
THEOREM
3.12. (Stroock and Varadhan) For any (.), a(-) and b(.) satisfying the above conditions, the following are equivalent. (i) {f(~(t))-f~(Lsf)(~(s))ds,~t,t>O} is a martingale on the probability space (O, Y , P ) for every f E C~(Rd); t 5 (ii) {f(t, ~(t)) - fo((N + Ls)f(s, ~(s))ds, o~t, t > 0} is a martingale on the probability space (f2, ~-,P) for every f E C~'2([0, ec) x Rd); (iii) {f(t,~(t)) exp[-- f~{((N t + L,)f/f)}(s, ~(s))ds], ~ t , t _> 0} is a martingale on the probability space ( O , Y , P ) for every f E C~'2([0, ec) x R d) which are uniformly positive; (iv) if u E R d and 9 c C~'2([0, oc) x Rd), then Ju,o(t) = exp u, ~(t) - ~(0) b(s)ds + g(t, ~(t))
1.for (u+Vg, a(s)(u+Vg)}(s,(s))ds 2 t 5 ~(s))dsl
for t _> 0 is a martingale on the probability space (f2, ~ , P) relative to the family {J~t, t _> 0}; (v) if u E Rd, then
t 1 t
Ju(t)=exp[(u,~(t)-~(O)-fo
b(s)ds)-2f0
(u,a(s)u}ds]
670
B. L. S. P r a k a s a R a o
is a martingale on the probability space (f2, ~ , P ) {Yt, t _> 0}; and (vi) if u E R d, then
t 1
relative to the family
Jiu(t) = exp [i/u, ~(t) - ~(0) - J~0 b ( s ) d s ) + ~ f0 (u' a(s)u)dsl is a martingale on the probability space (O, ~-,P) relative to the family t > 0) Here C~'2([0, oc) x R d) denotes the the space of all functions f : [0, oc) x R d ~ R such that f has the first bounded continuous time derivative and bounded continuous spatial derivatives of order less than or equal to two. For the proof of this result, see Stroock and Varadhan (1979), p. 86-90. REMARKS. For additional results of similar nature characterizing the OrnsteinUhlenbeck processes on R d, see Taylor (1989). Martingale characterizations of one-dimensional diffusions are discussed in Arbib (1965). Grigelionis (1975, 1977) studied characterizations of random processes with conditionally independent or independent increments through martingale structures. For results of similar type, see Chapter 4 of Liptser and Shiryayev (1989). 3.6. Characterization of Gaussian processes Consider the random variables X = alZ1 + ... +anZn , and
Y = blZi -- "'' @
bnZn
where the ai's and bi's are nonrandom and the set {Zi} are independent random variables. Skitovich showed that i f X and Y are independent, then all the random variables Zi for which aibi 0 are gaussian random variables. The following theorem due to Ramachandran (1967) extends this result to the infinite case. THEOREM 3.13. (Ramachandran) Let {Zi} be a sequence of mutually independent random variables and let {ai} and {bi} be two sequences of real numbers such that the subsequences {~: aibi 0} and {~: aibi 0} are bounded. Further suppose that
O<3 OO
X
i=1
aiZi
and
Y= F
i=1
bizi
exist with probability one and are stochastically independent. Then, for every i such that aibi :~ O, Zi is gaussian. Recall that some of the results in Section 2 are extensions of this result to the homogeneous processes with independent increments.
671
Suppose a process X - {X(t), 0 < t < T} is a process continuous in quadratic mean with mean zero and covariance function R(t, s). Then it is known that the process has the Karhunen-Loeve (K-L) expansion, that is, if {qSi(t)} and {2i} are the normalized eigen functions and eigen values respectively of the kernel R(t, s), then for t c [0, T],
OO
x(o = F_J,O,(t)
i-1
in the sense of convergence in the quadratic mean. Here
Ji =
and
/0
X(t)c?i(t)dt
E[JiJj] = 6ij with 6ij = 1 if i = j and 6~j = 0 if i j. Further more the series
OO
R(t, s) = ~ ~i~i(t)~i(s)
i=l
converges uniformly in t and s [cf. Ash and Gardner (1975), p. 37]. Pierre (1969) used this representation to characterize the Wiener process among the class of zero mean independent increment processes. He proved that if every two K - L coefficients in the K - L expansion of the process X are independent, then the process has to be gaussian, in fact, a Wiener process on [0, T]. Buhlmann (1963) considered a bigger class of processes and showed that if the process X is an LZ-martingale and all the K - L coefficients are independent, then the process is gaussian. Let us now consider a stochastic process X -= {X(t), 0 < t < T} with mean zero and covariance function R(t, s) = rain(t, s) and study its K - L expansions over the intervals [0, T~], i = 1,2 with T1 < T2. Let J/(Tj), i >_ 1 be the K - L coefficients corresponding to the K - L expansion on [0, Tj] for j = 1,2. Suppose that nT1 mT2 for any odd integers n and m. Pierre (1969) proved that if the K - L coefficients {Ji(T2)} are mutually independent, then the K - L coefficients Ji(T1) and Jk(T1) are independent if and only if the process X is gaussian. This result is not true if the condition that nT1 mT2 for any odd integers n and m is dropped [cf. Pierre (1969)]. For additional results of similar nature, see Pierre (1969). We now describe some results for gaussian sequences due to Bryc and Plucinska (1985). Suppose that {Xn, n >_ 1} is a gaussian sequence of random variables. It is well known that the following properties hold for the sequence {Xn}: (LR) the conditional expectation E(XilX1,...,Xi 1,X/+l,...,Xn) is a linear function of (X1,... ,Xi-1,N+I,... ,X~) almost surely for i = 1,... ,n, that is, the regression is linear, and the (CV) conditional variance is almost surely a constant, that is, var(XilX1,... ,X/_I,Xi+I,... ,X~) is a constant almost surely for i = 1 , . . . , n. Suppose that {X~, n _> 1} is a sequence of integrable random variables defined on a probability space (O, J , P) satisfying the condition (CV) and the condition (LR)* stated below.
672
(LR)* Each finite subsequence of {X,,, n _> 1} satisfies the condition (LR) and for each n > 1, a, # 0 in the conditional expectation
n
E(Xn+I IX1,... ,Xl~) = ao + ~

k=l
akXk .
Further suppose that the random variables {X,, n _> 1} are linearly independent in the sense that the functions X~ : f2 --+ R, 1 1. We say that the infinite sequence {Xn, n _> 1} satisfies the condition (CV) if every finite subsequence of it satisfies the condition (CV). The following result holds. THEOREM 3.14. (Bryc and Plucinska) Let {Xn, n _> 1} be an infinite sequence of linearly independent square integrable random variables satisfying the conditions (LR)* and (CV). Then the sequence {Xn, n > 1} is a gaussian sequence. As a special case, the following result holds for a discrete time Markov process. THEOREM 3.15. (Bryc and Plucinska) Let {Xn, n > 1} be a square-integrable Markov process such that (i) E ( X d X i _ l , X i + l ) = A i X i _ l + B i X i + l + C i for some constants Ag,Bi, and Ci, i > 1 with X0 = 0; (ii) Var(X/]X/_I,Xi+I) is a constant for i _> 1; and (iii) 0 < Ip(X~,X~+I)] < 1, i > 1 where p(X, Y) denotes the correlation coefficient between X and Y. Then {Am,n _> 1} is a gaussian sequence. For proofs of the above two theorems, see Bryc and Plucinska (1985). REMARKS. This theorem is similar to Theorem 3.3 characterizing the finite dimensional distributions of the Wiener processes. Theorem 3.14 is not true for finite sequences of random variables {X~, 1 < i < n}. Let (X],X2) be the bivariate random vector with the characteristic function q S ( q , t 2 ) = p c o s ( h + t 2 ) + ( 1 - p ) c o s ( t l - t2) where 0 < p < (1/2). Then the random vector satisfies the conditions (LR)* and (CV) but it is not gaussian. Plucinska (1983) discussed characterizations of gaussian processes with smooth covariance functions based on the conditional expectation and the conditional variance. Alternate characterizations of gaussian processes and gaussian sequences based on the properties of a covariance function are due to Wesolowski (1984), Plucinska and Wesolowski (1995). We now discuss some of these results. Let X = {Xt, t C T} where T = {0, 1 , 2 , . . . , } or T = [0, ec) be a square integrable zero mean stochastic process defined on a probability space (Q, Y , P) with a covariance function K. Further suppose that for any 0 _< t] < t2 < < tn, n >_ 1, K ( n ) ( q , . . . , tn) = det[(K(ti, tj))nn ] > 0 , (3.7)
Characterization a n d identifiability f o r stochastic processes
673
that is, X t , , . . . ,Xt,, are linearly independent functions on ~2. The process X is said to be a p r o c e s s with i n d e p e n d e n t linear f o r m s (ILF) or it fulfills an I L F condition if for any 0 < t 1 <2 t2 <2 " ' " <2 tn, n > 1, there exists real functions cij = c i j ( t l , . . . , t~), i = 1 , . . . , j - 1, j = 2, 3 , . . . , n such that
41 ,Xt2 Cl2Xtl, . . . ,Xt, Cn--l,nZ~tn-, . . . . . ClnXt,
are independent. Let K},~ ) be the cofactor of the element K ( t i , t,) in the matrix (K(ti, tj))n,. It can be checked that
ci,~ = - K i , ~ / K
(~)
(~-1)
The following theorem is due to Plucinska and Wesolowski (1995). THEOREM 3.16. (Plucinska and Wesolowski) Let X = { X n , n >_ 1} be a square integrable zero mean r a n d o m sequence satisfying the condition (3.7) and the I L F property. If, for any 1 <_ tl < . . . < tn, ti E { 1 , 2 , . . . , }, i > 1, n >_ 2, K(~) 1,n 0 , then X is a gaussian sequence. Suppose T = [0, oo) and the process {Xt, t E T} satisfies the condition: for any
O < tl < . . . < t,,, n > 4,
(3,8)
lira
t n 1--+in-2
E(XtnlXt, , . . . ,Xt,, 1) = E(Xt, IXq, . . . ,Xtn 2)
(3.9)
In addition suppose that the process is an I L F process with the property that for anyn>3andany0<tl<...<t~_~ <tn,
Cln(tl,...,tn)=O=:>Cl,n(tl,...,tn-Z,S, tn)=O, tn 2 <s<~tn--1
(3.10)
Plucinska and Wesolowski (1995) proved the following theorem characterizing the gaussian processes. THEOREM 3.17. (Plucinska and Wesolowski) Suppose X = {Xt, t > 0} is a square integrable zero mean stochastic process satisfying (3.7). Assume that it has the I L F property and the consistency conditions (3.9) and (3.10) are satisfied. If for any 0 < tl < t2 < t3, K(n) 1,n 0 for n = 2, 3, then the process X is a gaussian process. A stochastic process {Xt, 0 < t < 1} is said to be mean square continuous if
E X 2 < oo for all t and E[Xt - X,] 2 --+ 0 as t --+ s.
The following result, due to Bryc (1985) [cf. Bryc (1995)], gives a characterization of gaussian processes which are mean square continuous. THEOREM 3.18. (Bryc) Suppose that a stochastic process {Xt, 0 < t < 1} is mean square continuous and the correlation between Xt and Xs is ~ ~1 for all t s. Let
674
(s,, be the ~-algebra generated by {Xt, t _< s or t = u}. If there are functions
a(s, t, u), b(s, t, u), c(s, t, u), o-2(s, t, u) such that for every choice ofs _< t and every u, E[Xt[(s,u] = a(s, t, u) + b(s, t, u)Xs + c(s, t, u)Xu ;
(3.11)
(3.12)
and Var[Xtl~s,,] --- o-2(s, t, u) , then the process {Xt, 0 < t < 1} is gaussian.
3.7. Characterizations o f Poisson p r o c e s s
The following result characterizing a Poisson process through conditional moments is due to Wesolowski (1988) which is an improvement over an earlier result of Bryc (1987). This result is analogous to Theorem 3.3 giving the characterization of stochastic processes with the same finite dimensional distributions as those of a Wiener process. THEOREM 3.19. (Wesolowski) Let {Xt, t > 0} be a square integrable process such that for a n y 0 < t l < t z < . . - < t n < t < s , n > 1, (i) (ii) (iii) (iv) (v)
E(Xt) = t, Cov (Xt,X~) = min(t,s), E ( X t I X q , . . . ,Xt,,) = ClXt, -k ill, E ( X t l X t ~ , . . . ,Xto,X~) = c~2Xt~, + fl2Xs + tl, and
Var(XtlXt,,... ,Xto,X~.) = 7(X~ -Xt,)
where el,/31, c2,/32,7, and t/ are some constants depending on t, t l , . . . , tn, and s only. Then the process {Xt, t _> 0} is a Poisson process. For the proof, see Wesolowski (1988) and Bryc (1987). A slight extension of this result is the following due to Wesolowski (1990a). THEOREM 3.20. (Wesolowski) Let {Xt, t _> 0} be a square integrable process such that for a n y 0 _ < t l < t 2 < . . . < t n < t < s 1, (i) E(XslX~l, . . . ,X,o ,X,) = X~ + s - t, (ii) E(Xs[Xt~,... ,J(t,,,Xt,X,) = (u - t ) - l ( ( u - s ) X t + (s - t)X,), and (iii) Var(Xs[Xtl,... , X t , , X t , X ~ ) = (u - t)-Z(u - s)(s - t)(X~ - Xt), then the process {Xt, t > 0} is a Poisson process. Another variation is the following result due to Wesolowski (1997a). THEOREM 3.21. (Wesolowski) Let {Xt, t > 0} be a square integrable process such that for a n y 0 _ < t l < - . . < t n < t < s 1, (i) E ( ~ ) = t, (ii) Cov(Xt,X~) --- rain(t, s),

(iii) E[XslXtl,... ,Xt~,Xt,Xu] = cqXt +/~lXu + r/, and (iv) Var[XslXt,,... ,Xt,,,Xt,X,] = 7(2, - X t )
675
where q, ill, t / a n d 7 are some constants depending on q , . . . , tn, t, s and u. Then the process {Xt, t >_ 0} is a Poisson process. REMARKS. For recent results on non-gaussian measures with gaussian conditional structures, see Nguyen et al. (1996) and Arnold and Wesolowski (1997). Results characterizing the finite dimensional distributions of the gamma process, the negative-binomial process and the hyperbolic secant process by properties of conditional moments are given in Wesolowski (1989, 1990b, 1993).
4. Characterizations for Poisson process as a renewal process and as a point process
We have discussed several characterizations for Wiener process, stable processes, Poisson processes and Gaussian processes and in general characterizations for homogeneous processes with independent increments through stochastic integrals in the earlier sections. We now discuss some further characterizations for the class of Poisson processes. For an earlier review, see Galambos and Kotz (1978). The following properties of a Poisson process are well known. (i) the superposition of two independent Poisson processes with rates 21 and 22 is a Poisson process with rate 21 + 22; (ii) the process has independent increments; (iii) the time intervals between occurrences of an event are independent random variables; if in addition the process is stationary, then these random variables are identically exponentially distributed; (iv) for a stationary Poisson process, the conditional distribution of the events in a prescribed interval, given the number of events occurred in that interval, is uniform on that interval; and (v) the process is characterized by its first moment density, that is, the rate of occurrence 2(-) (which is constant in the stationary case). We now discuss some characterizations of a Poisson process related to the above properties.
4.1. Characterization by independence

Recall that a counting process (N(t), t >_ O} is called a nonhomogeneous Poisson process with independent increments if (i) N ( 0 ) = 0 a.s., (ii) the process has independent increments and (iii) for 0 _< s < t, N(t) - N(s) has a Poisson distribution with mean re(t) - m(s) where m(.) is a nonnegative nondecreasing function of t. Let ~ be the o--algebra of Borel subsets of R+ = [0, ec) and 3 be the class of finite union of disjoint finite intervals (a, bI on R+. Let I = (s, tI be an interval in
676
R+. Let ~(E) be an additive stochastic set function defined for E E ~- which counts the number of events of a counting process falling in E. In other words, ~(E) represents the total increments of the process of intervals constituting the set E. In particular, ~(I) -- N(t) - X ( s ) for I = (s, t]. Define 2(1) = re(t) - m(s) for I = (s, t] and 2(E) be a measure with no atoms in E E ~ such that 2(E) < oo. The following result is due to Renyi (1967). THEOREM 4.1. (Renyi) Let {N(t), t _> 0} be a counting process with N(0) = 0 a.s. If, for E E Y , ~(E) has a Poisson distribution with mean 2(E), then the process {N(t), t _> 0} is a Poisson process with mean function re(t). REMARKS. If it is only assumed that ~(I) has a Poisson distribution for any interval I in R+, then it is not necessary that the process {N(t), t _> 0} is a Poisson process by a result due to Shepp [cf. G o l d m a n (1968)]. Oakes (1972) constructed a counting process { N ( t ) , t >_ 0} such that for fixed k the r a n d o m variables ~(Ii), 1 0} is not a Poisson process. Such a process is called a k-fold quasi-Poisson process. Fang (1991) studied additional conditions under which one can conclude that the process {N(t), t _> 0} is a Poisson process when it is known that 4(1) is distributed as Poisson with mean 2(1). He proved the result under an exchangeability condition replacing the condition of independent increments. TnEOgEM 4.2. (Fang) Let { N ( t ) , t >_ 0} be a counting process. Suppose the following conditions hold. There exists 2 > 0 such that (i) for any interval I = (a, b ] , P { N ( b ) - N ( a ) = 0} = e xlII, (ii) P { X ( b ) - N ( a ) >_ 2} = o(21I D as [I I --+ 0 uniformly in l, and (iii) for any finite number of contiguous intervals Ii = (ti-1, ti], 1 < i < k, such that [Ill - 6, the r a n d o m variables XA(zi), 1 0} is a Poisson process with mean function re(t) = 2t. The above result deals with the homogeneous case. The following theorem due to Fang (199l) extends the same to the n o n h o m o g e n e o u s case. THEOREM 4.3. (Fang) Let {N(t), t _> 0} be a counting process and 2 be a nonatomic measure on (R+, ~ ) . F o r intervals I = (a, b] C R+, let 2(I) = re(b) - re(a), where m is a nonnegative nondecreasing continuous function with m ( 0 ) = 0. Suppose the following conditions hold: (i) for any interval I = (a, b] c R+, P ( U ( b ) - N ( a ) = O) = e -;~(r), (ii) P { X ( b ) - N ( a ) >_ 2} = o(2(1)) as 2(1) --+ 0 uniformly in I, and (iii) for any finite number of contiguous intervals Ii = (ti-i, ti], 1 0, the r a n d o m variables )~A(Ii), 1 0} is a nonhomogeneous Poisson process with mean function E(N(t) ) = m(t).
4.2. Characterization by cumulants

Fang (1991) characterized a nonhomogeneous Poisson process via cumulants and moments. We discuss these results briefly. For any random vector (X1,... ,Xr), define the rth joint cumulant by cum(X1,...,X~) = Z ( - 1 ) P - I ( p - 1 ) ' ( E I I X ] I ... ( E l [ X ] )
whenever it exists where the summation extends over all the partitions of (Vl,...,Vp), p = 1 , . . . , r of { 1 , . . . , r } . It can be checked that if E]X]]m < oc, 1 _<j_< m, then
E(X1...Xm) - E ( X 1 ) . . . E(Xm) = Z cum(Xk, k E v 1)... cum(Xk, k C Vp)

where ~ extends over all the partitions of (vl,...,Vp), p = 1 , . . . , ( m - 1) of {1,... ,r}. For the proof, see Block and Fang (1988). Further more, if for all positive integers qi <-- ri, 1 < i < m, such that ~ qi < ~ ri,
E(X " . . . X 2 ' ) = e ( X f ' ) . . . E ( X 2 ' )

then
E(X? . .x2' ) - e(x? ) ...
= cum(Xl,1,... ,Xl,r~,..., X~,I,... ,Xm,ro,) where Xj,i, 1 0} be a counting process with finite moments of all orders. This process is a nonhomogeneous Poisson process if and only if c u m ( N ( q ) , . . . , N(tk)) = min E(N(ti))
l<i<_k
for all 0 < ti, 1 _ 1. REMARKS. Newman (1970) introduced a class of Gauss-Poisson processes which share some properties of Poisson processes. These processes have also the property that the super position of two independent Gauss-Poisson processes is also a Gauss-Poisson process but they are characterized by the first two moment densities [cf. Bartlett (1966)] unlike the case of Poisson processes which are characterized by the first moment density. For a discussion of Gauss-Poisson processes, see Newman (1970).
678
4.3. Characterization from exchangeable and Markov properties

Let {N(t), t > 0} be a counting process defined on a probability space {f2, Y , P} with N(0) ----0, N(t) < ec for all t > 0 a.s. and with right continuous sample paths having successive unit step jumps at the times 1"1,T~ + T2,.... We assume that the event N(t) = 0 for all t > 0 has probability zero. The process {N(t),t >_ 0} is said to have the exchangeable (E) property if for every positive integer k > 1 such that P ( N ( t ) = k ) > 0, the conditional probability
P(T~ < xi, 1 0} is said to have the stronger exchangeable (SE) property if for every positive integer k _> 1 such that P(N(t) = k) > 0, the conditional probability
P Ti <_xi, 1 O} is said to be a #-mixed renewalprocess if for every positive integer n > 1
P(Ti <_xi, 1 < i < n) =
Px(Ti < xi)d#
where the family {Px, 2 E A} is a collection of probability measures such that Px(Ti < 0) = 0 for every 2 E A and # is a probability measure on (A, NA) where NA is the smallest o--algebra of subsets of A over which all the 2-functions Px(A) are measurable for all A E ~ . Note that the process {N(t),t_> 0} is a #-mixed renewal process, if for #-almost every 2 E A given P;~, the counting process {N(t), t _> 0} is a renewal process with inter-arrival time distribution function b3~(x) = P~(Ti _< x). If, for almost all 2 C A, F~(x) = 1 - exp{a(2)x} for some a(2) > 0, then the process {N(t), t > 0} is called a mixed Poisson process. Let Z be the almost sure limit of N(t) as t --+ oc. This limit exists since N(t) is nondecreasing in t. Suppose that P(Z < oc) = 1. If, for every k > 1 with P(Z = k) > 0, conditionally given Z = k, the process {N(t), t > 0} has the property E, then the process is said to be a mixed finite exchangeable process. For stochastic modelling by counting processes with the property E, in studies of systems with exchangeable component life times, see Shantikumar (1985). Define the process N = {N(t), t > 0} and the random variable Z as before. Let q = P(Z < oe). The following results characterizing exchangeable counting processes are due to Huang (1990).
679
THEOREM 4.5. (Huang) (i) Let 0 < q < 1. Then, conditional on the event [Z = oc], the process N has the property E if and only if it is a mixed renewal process. (ii) Let 0 < q < 1. Then the process N has the property E if and only if, given Z, the process N has the property E. (iii) Let 0 < q _< 1. Then the following statements are equivalent. (a) conditional on [Z < e~], the process N has the property E; (b) P(Ti <_ xi, 1 1; (c) for every k > 1, conditionally given the event [Z = k], the process N has the property E. The above result is a consequence of the de Finnetti's theorem on exchangeability [cf. Diaconis and Freedman (1980)]. For a detailed proof, see Huang (1990). We now discuss some further characterization results for Poisson and mixed Poisson processes. THEOREM 4.6. Let N = {N(t), t > 0} be a renewal process with inter-arrival time distribution function F which is continuous. Then the process N is a Markov process if and only if it is a Poisson process. As a generalization of the above theorem, the following result holds for a mixed Poisson process. THEOREM 4.7. Let N = {N(t), t > 0} be a/~-mixed renewal process with a given family of probability measures {P;~,2 E A} and the mixing probability measure #. Further suppose that for almost all 2 E A, the inter-arrival time distribution function F x ( t ) = P x ( T i < _ t ) is continuously differentiable on (0, c~) with 0 < F~(t) < c for all t > 0 where c is a positive constant. Then the process N is a Markov process if and only if it is a mixed Poisson process. P r o o f of the sufficiency part of this theorem is a consequence of the OS property for a mixed Poisson process as discussed in Section 4.7 [cf. Feigin (1979), Puri (1982)]. For a proof of the necessary part, see Huang (1990). For additional results characterizing mixed Poisson processes by the Markov property and exchangeable property, see Huang (1990). 4.4. Characterization of Poisson processes arising in queues Consider a counting process A = {A(t), t >_ 0} process with A(0) = 0, A(t) < oc for all t >_ 0 a.s., with right continuous sample paths and with successive unit steps at times 0 < $1 < $2 < ..-. Let us call such a process an arrivalprocess. Suppose that each arrival is allowed to depart instantly to a compartment Ci with positive k probability Pi, i = 1 , 2 , . . . , k , k_>2 with ~ i = l p i = l . L e t Di(t) denote the number of departures of the i-th type during (0, t]. The following result is due to Fitchner (1975).
680
THEOREM 4.8. (Fitchner) The arrival process A is Poisson if and only if the departure processes {Di(t), t > 0}, l _ 0} is a Poisson process for every 1 < i < k. For a generalization of this result, see Kimeldorff and Thall (1983). Suppose that the arrival process is a renewal process, each arrival is served, independent of everything else, for a random length of time with a common distribution function G and allowed to depart to a compartment Ci with probability Pi as stated earlier. Let D(t) = ~Y=l Di(t) be the overall departure process. The following theorem is due to Huang and Puri (1990). THEOREM 4.9. (Huang and Puri) Let {A(t), t _> 0) be an arrival process which is a renewal process and suppose that the service time distribution function G satisfies the condition G(0) < 1 and the service times are independent of the arrival process. Let {D(t), t _> 0} be the overall departure process. Suppose that D(t) has a Poisson distribution for every t _> 0 with ED(t) = re(t) and m ( 0 ) = 0. Further more suppose that the function m(t) is continuously differentiable with derivative m'(t) = 0 for all 0 < t < a and m'(t) > O, t > a for some constant a > 0. Then (i) the distribution function G(.) is continuous, (ii) m(t) = b fo G(u)du for some b > 0, and (iii) the process {A(t), t > 0} is a homogeneous Poisson process with rate b. REMARKS. Huang and Puri (1990) studied characterization results for departure processes to be a Poisson process when the arrival process is Poisson and vice versa in the case of an infinite servers queue with nonidentically distributed service times. We do not discuss these results here. 4.5. Characterization via age at the time t or residual life at the time t o f a renewal process Chung (1972) discussed various characterizations of the exponential distribution and related properties of a Poisson process. Let F(-) be a probability distribution on [0, oo) satisfying the property F(x) > 0 forx>0 and F(0)=0 . (4.1)
Let S,, n > 0 be a renewal process associated with F. In other words, S0=0, Sn=X~+...+X~, n>_l
where X,, n > 1 are independent and identically distributed (i.i.d.) with distribution the function F. For every t > 0, there exists an unique integer Nt satisfying the inequality SN, _< t < SN,+a. The random variable Nt is the number of renewals in [0, t]. In fact Nt=sup{n:S, <t} .
681
Let
Yt = t - SN,
and
Zt = SN,+I -- t .
Then Yt is the spent time or the age at the time t and Zt is the residual waiting time at the time t. I f F is an exponential distribution, then it is known that lit and Zt are independent random variables. Further more {Nt, t _> 0} is a Poisson process. In such a case, (i) Zt has the exponential distribution F for all t and (ii) Yt has the distribution Ft obtained from F by truncating at t, that is,
Ft(u) = F ( u )
=1
foru<_t foru>t .
Chung (1972) proved that either of (i) and (ii) given above implies that the process is Poisson. In fact this result holds if Zt has the exponential distribution F for a sequence of values of t tending to infinity. Erickson and Guess (1973) proved the following result characterizing the exponential law and in turn characterizing a Poisson process in the class of renewal processes. THEOREM 4.10. (Erickson and Guess) Let F be a distribution function satisfying the conditions stated above and {S~} be a renewal process with inter-arrival time distribution F. If for some to > 0, Yt0 and Zto arc independent, then F is an exponential distribution and hence Yt and Zt are independent and the renewal process {G} is a Poisson process. For proof of this result, see Erickson and Guess (1973). Holmes (1974) proved the following result. THEOREM 4.11. (Holmes) Let F be a distribution function satisfying the conditions stated above and {Sn} be a renewal process with inter-arrival time distribution F. The renewal process {Sn} is a Poisson process if and only i f E Z t is finite and independent of t. Cinlar and Jagers (1973) noted that the Poisson process enjoys two special properties, namely, the mean (expected) forward recurrence time at the time t (residual life at the time t) does not depend on t and the mean (expected) backward recurrence time at the time t (the age at the time t or the spent time at the time t) is the mean (expectation) of the inter-arrival (interval) distribution at the time t. They have proved that the Poisson process is the only renewal process with this property. Their result is as follows. THEOREM 4.12. (Cinlar and Jagers) (i) IfE[Zt] is a positive constant c for all t > 0, then the process {Nt, t > 0} is a Poisson process with mean (I/c); (ii) if E[Ytl =
/0
x dFt(x),
t> 0
682
where Ft is as defined earlier, then the process {Nt, t > 0} is a Poisson process; and (iii) if
E[tSnINt = n] = E[XllNt = n], t >_ O, n >_ 1 ,
then the process {Art, t _> 0} is a Poisson process. In a recent paper, Huang and Li (1993a) proved that if the Var(Zt) is a constant for all t for a renewal process, then the inter-arrival time distribution is either exponential or geometric. Additional results of this type are discussed in Gupta (1984), Gupta and Gupta 1986), Huang and Li (1993a, b), Huang et al. (1993) and Li et al. (1994).
4.6. Characterization via superposition o f renewal processes
Let {Xn, n _> 0} be a sequence of i.i.d, strictly positive random variables and {Y,, n >_ 0} be another such sequence independent of {X,, n _> 0}. Let F and G be the distribution functions of X0 and Y0 respectively. Consider the renewal processes S~=Y0+. and T,=Y0+...+Y~, n_>0 . -+X,, n_>0
Let S, Tn denote the superposition of the renewal processes S~ and Tn ordered in an increasing sequence. Let Z, be the increments of this process so that
& Tn = Zo + . . . + Z , , n E O .
The following theorem is due to Samuels (1974). THEOREM 4.13. (Samuels) A necessary and sufficient condition for the superposition Sn Tn of two independent renewal processes S~ and Tn to be a renewal process is that the two component processes S, and Tn are Poisson processes and hence the process S, Tn is also a Poisson process. For proof, see Samuels (1974). O'Cinneide (1991) discussed identifiability of superpositions of renewal processes. Suppose a point process is known to be the superposition of two renewal processes. Is it possible to identify the distributions driving the component renewal processes from the knowledge of the superposition alone? O'Cinneide (1991) showed that the answer is no if the component renewal processes are Poisson. Apart from this special case and under additional assumption of the existence and analyticity of the densities of the inter-arrival distributions of the components, the answer is yes.
Characterization and identifiabilityfor stochasticprocesses 4.7. Characterization via order statistic property
683
Consider a point process {M(t), t >_ 0} with right continuous sample paths and with jumps of size unity at successive times $1, $2,... and a nondecreasing mean function m(t) =- ElM(t)] < oc, t >_ O. The process {M(t), t > 0} is said to have the order statistic property (OS property) if, conditional on m(t) - M ( 0 ) = k, the successive jump times {S1,S2,...,Sk} are distributed as the order statistics of k i.i.d, random variables with a distribution function F supported on [0, t] [cf. Crump (1975)]. If F is the uniform distribution on [0, t], then we say that the process {re(t), t >_ 0} has the uniform order statistic property (UOS property). For every Radon measure 2 on the real line, let Pz denote the Poisson process with intensity 2, that is, with the property that the mean occurrences in a Borel set B is 2(B). A point process P is called a doubly stochastic Poisson process if it can be represented as a mixture of Poisson processes, that is,
p=f
P~Q(d2)
where Q is a probability measure on a suitable a-algebra of subsets of all Radon measures [cf. Kingman (1964)]. The following result is due to Feigin (1979). THEOREM 4.14. (Feigin) A point process M has the UOS property if and only if there exists a homogeneous Poisson process N with unit rate such that N(0) = M(0) a.s. and
M(.) = N(W(.))
a.s.
where W is a nonnegative random variable independent of N. Equivalently, the process M has the UOS property if and only if it is a doubly stochastic Poisson process with stochastic intensity function of the form .~(t) = m . Suppose a process is initiated with a random number Z of particles at the time t = 0. Further suppose that each particle undergoes a death process with a common life distribution function F which is continuous with F(0) = 0. Let D(t) denote the number of particles occurred in the interval (0, t]. The process {D(t), t >_ 0} with D(0) = 0 is called a mixed sample process. The following result is due to Puri (1982). THEOREM 4.15. (Puri) The mixed sample process {D(t), t _> 0} with D(0) = 0 is a Markov process with OS property. For related results, see Defner and Haeusler (1985) and Lieberman (1985). We now discuss the relation between a renewal processes and the OS property following Lieberman (1985). Let {N(t), t > 0} be a counting process, that is, a nonnegative integer-valued stochastic process that records the successive occurrences N(t) of an event during
684
the interval [0, t]. We assume that N(0) = 0 a.s. Let the successive times between the occurrences of the events be T~, i > 1. Note that T~is the time elapsed between ( i - 1 ) t h occurrence of the event and ith occurrence of the event. Let G(t) = P(Tk _< t), k _> 1. We assume that G(0) = 0. Define
Sn=TI+...+Tn, n> 1 .
Then $I,$2,... denote the successive waiting times. Let m(t) = E[N(t)], t >_ 0 be the expected number of events. Then m(0) = 0 since N(0) = 0 a.s. by assumption. Following the terminology discussed above, the counting process {N(t), t _> 0} is said to have the OS property if, conditional on N(t) = n, the successive waiting times S;, 1 < i < n are distributed as the order statistics of n i.i.d, random variables with a distribution function Ft on [0, t]. THEOREM 4.16. (Lieberman) A counting process has the OS property if and only if it is a Poisson process, in which case, the distribution function Ft is the uniform distribution on [0, t]. REMARKS. Some further results on characterizations for Poisson process or mixed Poisson process are discussed in Mecke (1977), Lin (1978), Chouinard and McDonald (1985), Isham et al. (1975), and Pfeifer and Heller (1987).
5. Miscellaneous characterizations
We now discuss some results which do not fit into the general themes of the earlier sections but which are interesting in their own right.
5.1. Identifiability o f component stochastic processes f r o m the bivariate vector o f sums or differences o f components
Let {X/(t),t > 0}, 1 < i < 3 be independent stochastic processes defined on a probability space (f2, ~,~, #). Suppose that the processes have sample paths in L2(R, ~.~, 2) in the sense that
~Xi2(t , o))dt < oc
a.s. [#] .
Here .~ is the a-algebra of Borel subsets of R+ and 2 is the Lebesgue measure on R+. The processes {X/(t), t > 0} can be considered as random elements X/taking values in the Hilbert space H = L2(R, .~, 2). The problem of interest is to check whether the joint probability measure of (X1 - X 3 , X2 - ) ( 3 ) on the space H determines the probability measures of the components X / o n H. Let (f2, Y , #) be a probability space and H be a real separable Hilbert space. Let Y be the a-algebra of Bore1 subsets of H generated by the norm topology. X is said to be a random element defined on f2 and taking values in H i f X : Q --+ H is such that X 1B ff Y for every B C ~f. Define #x(B) = # ( X - 1 B ) , B E ~ . The proba-
685
bility measure #x is called the probability measure induced by X on H. Let (x,y) denote the inner product on H for x,y E H. For any probability measure v on (H, ~4~),the characteristicfunctional i(.) is a functional defined on H by the relation 5(y) -- f/ei(x'Y)v(dx), y c H
m
For an extensive discussion on probability measures on a Hilbert space, see Parthasarathy (1967). The following theorem is due to Kotlarski (1966). THEOREM 5.1. (Kotlarski) Let X1, )(2 and X3 be independent random elements taking values in real separable Hilbert space H. Define Z1 = X 1 - ) ( 3 and Z2 = X2 -X3. Suppose the characteristic functional of (Z1, Z2) does not vanish. Then the probability measure of (Z1, Z2) determines the probability measures of X~ ,)(2 and )(3 up to change of location. For the proof of this theorem, see Kotlarski (1966) [cf. Prakasa Rao (1992)]. For a comprehensive discussion of these types of results and related problems, see Prakasa Rao (1992). We will now discuss a similar result for point processes due to Prakasa Rao (1975b). Every point process N(.) on [0, e~) corresponds to a triple (~2,~-, PN) where ~2 is the set of all countable sequences of real numbers {ti} without limit points and is the a-algebra generated by the cylinder sets and PN is a probability measure [cf. Harris (1963)]. The probability generating functional of a point process N(-) is defined by
G(~) = E { e x p ( f ~ l o g ~(t)dN(t)) ), N(A)
~~ v
(if (t) = 0 over some set A in [0, oc), the exponent is defined to be zero unless = 0 when it is defined to be equal to one). Here v denotes the class of measurable functions ~ such that 0 _< (t) _> 1 for real t and (t) -- 1 outside a bounded interval. For properties of the probability generating functional, see Westcott (1972). The following result holds for point processes. THEOREM 5.2. (Prakasa Rao) Let No, N1 and N2 be three independent point processes and define M1 = N1 + No and M2 = N2 + No. Then the bivariate point process (M1,M2) uniquely determines the point processes No, N1 and N2.
5.2. Characterization of linear growth birth-death process

Modeling of biological systems by stochastic processes is well known. For the theory of Birth and Death processes (BDP) with linear growth, see Karlin and McGregor (1958). Ycart et al. (1986) observed that if the initial distribution of a linear growth BDP X(t) is binomial, then it is also binomial for all t > 0. We now discuss similar results for other linear BDP's due to Ycart (1988).
686
A B D P is a M a r k o v process X = {X(t), t_> 0} with values in the set J ~ = {0, i, 2 , . . . } satisfying the conditions
P[X(t + At) = n + l IX(t ) = n] = 2.At + o(At),

P[X(t + A t ) = n - l l X ( t ) = n] = nAt + o(At),
P[Y(t + At) = nlX(t ) = n] = 1 - (2n #,)At + o(At) ,

where 0 < 2, and 0 0 where 2, #, r are positive constants. A n o t h e r example of a linear growth B D P is obtained when 2n = 2, #n = #, n _> 0 where 2, # are positive constants. This can be interpreted as a queuing process with infinitely m a n y servers. A third example of a linear growth B D P is obtained when 2n = 2 ( N - n), n = 0 , . . . , N , /i n = / i n , n = 0 , . . . , N and 2n = / i n = 0, n > N + 1 where N is a fixed positive integer and 2,/i are positive constants. In all the above three examples, the birth and death processes have the p r o p e r t y that the distribution of X(t) is either negative binomial (first example) for every t or Poisson (second example) for every t or binomial (third example) for every t. Ycart (1988) characterized those B D P ' s {Xt} which have at any instant t either a negative binomial, Poisson or binomial distribution. H e proved that, for each o f these families, the process is either a stationary process or a linear growth BDP. Related results are discussed in A d k e and R a t n a p a r k h i (1990).
5.3. Characterization of stationary processes differentiable in mean square

M a z o and Salz (1970) proved the following theorem for stationary processes differentiable in m e a n square. THEOREM 5.3. (Mazo and Salz) Let {X(t), t E T} be a stationary process differentiable in m e a n square with m e a n square derivative {Z(t), t c T}. Then
E(Z(t)IX(t)) = 0
for every t c T.
a.s.
(5.1)
The following result due to P r a k a s a R a o (1972a) gives a characterization of stationary processes differentiable m e a n square. THEOREM 5.4. (Prakasa Rao) Let {X(t), t E T} be a m e a n square differentiable stochastic process with m e a n square derivative {Z(t), t c T}. Let
W(tj) = E(Z(tj)]X(tr), 1 < r < k),
1< j < k
(5.2)
for tj, 1 _< j < k, k _> 1 in T. Then a necessary and sufficient condition that the process {X(t), t E T} is stationary is that
Characterization and identifiability for stochastic processes k E{exp[i~urX(tr)l k ~urW(tr)}=O
687
(5.3)
for all tr, 1 < r < k, k >_ 1 in T a n d for all real n u m b e r s ur, 1 < r < k, k >_ 1. REMARKS. Szablowski (1989) investigated the p r o b l e m whether the first two c o n d i t i o n a l m o m e n t s identify a m e a n square differentiable process. 5.4. Additional results Boege a n d Moecks (1986) studied characterization of Dirichlet processes. D o r e a (1985) o b t a i n e d a characterization of a m u l t i p a r a m e t e r W i e n e r process. Gzyl (1987) investigated the properties of a vector valued gaussian s t a t i o n a r y M a r k o v process. H s i n g (1987) o b t a i n e d a characterization of certain class of p o i n t processes. Vershik (1964) discussed some characteristic properties of gaussian processes. We do n o t go into details of these results.
Acknowledgement
The a u t h o r t h a n k s the referee for his suggestions in o r g a n i z a t i o n of the material, for his detailed c o m m e n t s a n d for p r o v i d i n g the a d d i t i o n a l references relevant to the text.
References
Adke, S. R. and M. V. Ratnaparkhi (1990). A characterization of linear birth death processes. J. ind. Stat. Assoc. 28, 9-15. Arbib, M. (1965). Hitting and martingale characterization of one-dimensional diffusion. Z. Wahrsch. verw Gebiete 4, 232-247. Arnold, B. and J. Wesolowski(1997). Multivariate distributions with gaussian conditional structures. In Stochastic Processes and Functional Analysis (Eds., J. A. Goldstein, N. E. Gretsky and J. J. Uhl Jr), Lecture Notes in Pure and Applied Mathematics, Vol. 186, pp. 45 59. Marcel Dekker, New York. Ash, R. and M. F. Gardner (1975). Topics in Stochastic Processes. Academic Press, New York. Bartlett, M. S. (1966). An Introduction to Stochastic Processes. Cambridge University Press, Cambridge. Beska, M., A. Klopotowski and L. Slominski (1982). Limit theorems for random scems of dependent d-dimensional random vectors. Z. warsch, verw Gebiete 61, 43-57. Boege, W. and J. Moecks (1986). Learn-merge invariance of priors: A characterization of Dirichlet distributions and processes. J. Multivar. Anal. 18, 83-92. Braverman, M. S. (1987). On a property of absolute moments. Theoret. Prob. Math. Stat. 34, 29-37. Bremaud, P. (1975). An extension of Watanabe's theorem of characterization of Poisson processes. J. Appl. Prob. 12, 396-399. Bremaud, P. (1981). Point Processes and Queues. Springer, New York. Block, H. and Z. Fang (1988). A multivariate extension of Hoeffding's lemma. Ann. Prob. 16, 1803 1820. Barlett, M. S. (1966). An Introduction to Stochastic Processes. Cambridge University Press, Cambridge. Bryc, W. (1985). Some remarks on random vectors with nice enough behaviour of conditional moments. Bull. Polish Acad. Sci. Math. 33, 677-683.
688
Bryc, W. (1987). A characterization of the Poisson process by conditional moments. Stochastics 20, 17-26. Bryc, W. (1995). The Normal Distribution." Characterization and Applications. Lecture Notes in Statistics, Vol. 100, Springer, New York. Bryc, W. and A. Plucinska (1985). A characterization of infinite gaussian sequences by conditional moments. Sankhya A 47, 166-173. Buhlmann, H. (1963). L2-martingales and orthogonal decomposition. Z. Warsch. verw Gebiete 1, 394-414. Cabana, E. M. (1990). On a martingale characterization of two-parameter Wiener process. Stat. Prob. Lett. 10, 263~70. Chouinard, A. and D. McDonald (1985). A characterization of non-homogeneous Poisson process. Stochastics 15, 113 119. Cinlar, E. and P. Jagers (1973). Two mean values which characterize the Poisson process. J. Appl. Prob. 10, 678-68l. Chung, K. L. (1972). The Poisson process as a renewal process. Periodica Math. Hungar. 2, 41M8. Crump, K. S. (1975). On point processes having an order statistic structure. Sankhya A 37, 396M04. Defner, A. and E. Haeusler (1985). A characterization of order statistic point processes that are mixed Poisson processes. J. Appl. Prob. 22, 314-323. Diaconis, P. and D. Freedman (1980). De Finnetti's generalisation of exchangeability. In Studies in Induction Logic and Probability (Ed., R. Jeffrey), University of California Press, Berkeley. Doob, J. L. (1953). Stochastic Processes. Wiley, New York. Dorea, C. C. Y. (1982). A characterization of the multiparameter Wiener process and an application. Proe. Amer. Math. Soe. 85, 267-271. Elliott, R. J. F. (1982). Stochastic Calculus. Springer, New York. Erickson, B. K and H. Guess (1973). A characterization of the exponential law. Ann. Prob. 1, 183-185. Fang, Z. (1991). Characterization of nonhomogeneous Poisson processes via moment conditions. Stat. Prob. Lett. 12, 83-90. Feigin, P. D. (1979). On the characterization of point processes with the order statistic property. J. Appl. Prob. 16, 297-304. Fitchner, K. H. (1975). Charakterisierung Poissonscher zufalliger Punktfolgen und infenitesimale Verdunnungsschemata. Math. Nachr 68, 93-104. Galambos, J. and S. Kotz (1978). Characterizations of Probability Distributions. Lecture Notes in Mathematics, No. 675, Springer, Berlin. Goldman, J. R. (1968). Stochastic point processes: limit theorems. Ann. Math. Stat. 38, 771-779. Grigelionis, B. (1975). Characterization of random processes with conditionally independent increments. Litovsk. Matem. Sbornik 15, 53-60 (In Russian). Grigelionis, B. (1977). On a martingale characterization of random processes with independent increments. Litovsk. Matem. Sbornik 17, 75-86 (In Russian). Gupta, R. C. (1984). Some characterizations of renewal densities with emphasis in reliability. Mat. Oper. Star. 15, 571-579. Gupta, P. L. and R. C. Gupta (1986). A characterization of the Poisson process. J. Appl. Prob. 23, 233-235. Gzyl, H. (1987). Characterization of vector valued Gaussian stationary Markov processes. Stat. Prob. Lett. 6, 17-19. Harris, T. E. (1963). The Theory of Branching Processes. Springer, Berlin. Holmes, P. T. (1974). Another characterization of the Poisson process. Sankhya A 36, 449~450. Hsing, T. (1987). On the characterization of certain point processes. Stoch. Process. Appl, 26, 297-316. Huang, W. J. (1990). On the characterization of point processes with the exchangeable and Markov properties, Sankhya A 52, 16-27. Huang, W. J. and S. H. Li (1993a). Characterizations of the Poisson process using the variance. Comm. Stat. Theoret. Meth. A 22, 1371-1382. Huang, W. J. and S. H. Li (1993b). Characterization results based on record values. Stat. Sinica 3, 583-599.
Characterization and identifiabilily for stochastic processes
689
Huang, W. J., S. H. Li and J. C. Su (1993). Some characterizations of the Poisson process and geometric renewal process. J. Appl. Prob. 30, 12t-130. Huang, W. J. and P. S. Puri (1990). On some characterizations of the Poisson processes arising in queues with infinite servers. Sankhya A 52, 232-243. Isham, V., D. N. Shanbhag and M. Westcott (1975). A characterization of Poisson processes using forward recurrence times. Math. Proc. Cambridge Philos. Soc. 78, 513-516. ivanoff, B. G. (1985). Poisson convergence for point processes oi1 the plane. Y. Australian Math. Soc. Set. A 39, 253-269. Kagan, A. M., Yu. V. Linnik and C. R. Rao (1973). Characterization Problems in Mathematical Statistics. Wiley, New York. Kannan, D. (1972a). An operator-valued stochastic integral II. Ann. Inst. H. Poincare B 8, 9-32. Kannan, D. (1972b). An operator-valued stochastic integral III. Ann. Inst. H. Poincare B 8, 217-228. Kannan, D. and A.T. Bharucha-Reid (1971). An operator-valued stochastic integral. Proc. Japanese Acad. 47, 472~476. Karlin, S. and J. McGregor (1958). Linear growth birth and death process. J. Math. Mech. 7, 643 662. Kimeldorff, G. and P. F. Thall (1983). A joint characterization of the multinomial distribution and the Poisson process. J. Appl. Prob. 20, 202 208. Kingman, J. F, C. (1964). On doubly stochastic Poisson processes. Proc. Cambridge Philos. Soc. 60, 923 932. Kotlarski, I. I. (1966). On some characterizations of probability distributions in Hilbert spaces. Annal di Malem. Pura et Appl. 74, 129 134. Laha, R. G. and E. Lukacs (1960). On some characterization problems connected with quadratic regression. Biometrika 47, 335-343. Laha, R. G. and E. Lukacs (1965). On linear forms and stochastic integrals. Bull. Inst. Internal. Slat. 41, 828-846. Labs, R. G. and E. Lukacs (1968). On a property of the Wiener process. Ann. Inst. Stat. Math. 20, 383-389. Levy, P. (1948). Processus Stochastique et Mouvement Brownien. Gauthier-Villars, Paris. Li, S. H., W. J. Huang and M. N. Lo (1994). Characterizations of the Poisson process as a renewal process via two conditions. Ann. Inst. Slat. Math. 46, 351-360. Lieberman, U. (1985). An order statistic characterization of the Poisson renewal process. J. App1. Prob. 22, 394407. Lin, T. F. (1978). A characterization of the Poisson process. Soochow J. Math. 4, 83 87. Liptser, R. S. and A. N. Shiryayev (1989). Theory of Martingales. Kluwer, Dordrecht. Lukacs, E. (1969). A characterization of stable processes. J. Appl. Prob. 6, 409~418. Lukacs, E. (1970a). Characterization theorems for certain stochastic processes. Rev. lnternat. Stat. Inst. 38, 333-342. Lukacs, E. (1970b). Characteristic Functions. Griffin, London. Lukacs, E. (1975). Stochastic Convergence. Academic Press, New York. Lukacs, E. (1977). A stability theorem for a characterization of the Wiener process. Trans. 7th Prague Conf., pp. 375-390. Lukacs, E. and R. G. Laha (1964). Applications of Characteristic Functions. Hafner, New York. Mazo, J. E. and J. Salz (1970). A theorem on conditional expectation. IEEE Trans. Inform. Theory IT-16, 379-381. Mecke, J. (1977). A characterization of mixed Poisson process. Rev. Roum. Math. Pures et Appl. 21, 1355-1360, Meyer, P. A. (1976). Un cours sur les integrales stochastiques, Seminaire de Probabilites X. Lecture Notes in Mathematics No. 511, Springer, New York. Merzbach, E. and D. Nualart (1986). A characterization of spatial Poisson process and changing time. Ann. Prob. 14, 1380-1390. Newman, D. (1970). A new family of point processes which are characterized by their second moment properties. J. Appl. Prob. 7, 338 358.
690
B . L . S . Prakasa Rao
Nguyen, T. T., G. Rempala and J. Wesolowski (1996). Non-gaussian measures with gaussian structure. Prob. Math. Star. 16, 287-298. Oakes, D. (1972). A k-fold quasi-Poisson process. In Progress in Statistics, European Meeting of Statisticians, Budapest (Hungary) Vol. II (Ed., J. Gain), pp. 583-587. O'Cinneide, C. (1991). Identifiability in superpositions of renewal processes. Comm. Stat. Stoch. Models 7, 603-614. Parthasarathy, K. R. (1967). Probability Measures on Metric Spaces. Academic Press, London. Pfeifer, D. and U. Heller (1987). A martingale characterization of mixed Poisson processes. J. App. Prob. 24, 246-251. Pierre, P. A. (1969). Characterizations of gaussian random processes by representations in terms of independent random variables. Memorandum RM-6092-PR Rand Corporation. Plucinska, A. (1983). On a stochastic process determined by the conditional expectation and the conditional variance. Stochastics 10, 115-129. Plucinska, A. (1998). A stochastic characterization of Hermite polynomials. J. Math. Sci. (New York), 89, 1541-1544. Plucinska, A. and J. Wesolowski (1995). Gaussian processes via independence of linear forms. In Exploring Stochastic Laws (Eds., A. V. Skorokhod and Yu. V. Borovskikh), VSP, Utrecht, The Netherlands. Prakasa Rao, B. L. S. (1968). On a characterization of symmetric stable processes with finite mean. Ann. Math. Star. 39, 1498 1501. Prakasa Rao, B. L. S. (1970). On a characterization of the Wiener process by constant regression. Ann. Math. Stat. 41, 321-325. Prakasa Rao, B. L. S. (1971). Some characterization theorems for Wiener process in a Hilbert space. Z. Warsch. verw Gebiete 19, 103-116. Prakasa Rao, B. L. S. (1972). Characterization of Wiener process by symmetry. Sankhya A 34, 227-234. Prakasa Rao, B. L. S. (1972a). Characterization of stationary processes differentiable in mean square. IEEE Trans. Inform. Theory IT-18, 659-661. Prakasa Rao, B. L. S. (1975a). Characterization of stochastic processes determined up to shift. Theory Prob. Appl. 20, 623 626. Prakasa Rao, B. L. S. (1975b). On a characteristic property of point processes. J. Australian Math. Soc. Ser A 21, 108-111. Prakasa Rao, B. L. S. (1983). Characterization of stochastic processes by stochastic integrals. Adv. Appl. Prob. 15, 81-98. Prakasa Rao, B. L. S. (1992). Identifiability in Stochastic Models." Characterization of Probability Distributions. Academic Press, Boston. Prakasa Rao, B. L. S. (1998). On a characterization of stochastic processes by the absolute moments of stochastic integrals. Teoria Veroyat. iee Primenen. 43, 189-191. Prakasa Rao, B. L. S. and B. Ramachandran (1983). On a characterization of symmetric stable processes. Aequationes Mathematicae 26, 113-119. Puri, P. S. (1982). On a characterization of point processes with the order statistic property. J. Appl. Prob. 19, 39-51. Ramachandran, B. (1967). Advanced Theory of Characteristic Functions. Statistical Publishing Society, Calcutta. Ramachandran, B. (1994). Identically distributed stochastic integrals, stable processes and semi-stable processes. Sankhya A 56, 25-43. Ramachandran, B. (1997). On geometric-stable laws, a related property of stable processes, and stable densities of exponent one. Ann. Inst. Stat. Math 49, 299-314. Ramachandran, B. and K. S. Lau (1991). Functional Equations in Probability Theory. Academic Press, Boston. Ramachandran, B. and B. C. S. Prakasa Rao (1984). On the equationf(x) = f ~ f ( x +y)d#(y),x c R. Sankhya A 46, 326-339. Ramachandran, B. and C. R. Rao (1970). Solutions of functional equations arising in some regression problems and a characterization of the Cauchy law. Sankhya A 32, 1 31.
691
Rao, C. R. and D. N. Shanbhag (1994). Choquet Deny Type Functional Equations. Wiley, New York. Renyi, A. (1967). Remarks on the Poisson process. In Lecture Notes in Math. No. 31, pp. 280-286, Springer, Berlin. Revuz, D. and M. Yor (1994). Continuous Martingales and Brownian motion. Springer, Berlin. Riedel, M. (1980a). Representation of the characteristic function of a stochastic integral. J. Appl. Prob. 17, 448~455. Riedel, M. (1980b). Characterization of stable processes by identically distributed stochastic integrals. Adv. Appl. Prob. 12, 689-709. Samuels, S. M. (1974). A characterization of the Poisson process. J. Appl. Prob. 11, 72-85. Shantikumar, J. G. (1985). Lifetime distribution of consecutive k-out of-n: F systems with exchangeable life times. IEEE Trans. Reliability 34, 480-483. Skitovich, V. P. (1956). On a characterization of Brownian motion. Theory Prob. Appl. 1, 326 328. Stroock, D. and S. R. S. Varadhan (1979). Multidimensional Diffusion Processes. Springer, Berlin. Szablowski, P. J. (1989). Can the first two conditional moments identify a mean square differentiable process?. Computers Math. Applic. 18, 329-348. Taylor, J. C. (1989). The minimal eigenfunctions characterize the Ornstein-Uhlenbeck process. Ann. Prob. 17, 1055-1062. Vakhania, N. N. and N. P. Kandelski (1967). A stochastic integral for operator-valued functions. Theory Prob. Appl. 12, 525-528. Vershik, A. M. (1964). Some characteristic properties of stochastic gaussian processes. Theory Prob. Appl. 9, 353 356. Wang, Y. (1974). A note on homogeneous processes with independent increments. Ann. Inst. Stat. Math. 26, 356 360. Wang, Y. (1975). Characterization of some stochastic processes. Ann. Prob. 3, 1038-1045. Watanabe, S. (1964). On discontinuous additive functionals and Levy measures of a Markov process. Jap. J. Math. 34, 53-70. Weiss, G. and M. Westcott (1976). A note on identification, characterization of the gaussian distribution and time reversibility in linear stochastic processes. Z. Warseh. verw Gebiete 35, 151-157. Wesolowski, J. (1984). A characterization of a Gaussian process based on properties of conditional moments. Demonstratio Mathematica 17, 795-808. Wesolowski, J. (1988), A remark on a characterization of the Poisson process. Demonstratio Mathematica 21, 555 557. Wesolowski, J. (1989). A characterization of the gamma process by conditional moments. Metrika 36, 299-309. Wesolowski, J. (1990). A martingale characterization of a Wiener process. Stat. Prob. Lett. 10, 213-225 (correction: ibid 19, 167). Wesolowski, J. (1990a). A martingale characterization of the Poisson process. Bull. Pol. Acad. Sci. Math. 38, 1-12. Wesolowski, J. (1990b). Characterizations of some processes by properties of conditional moments. Demonstratio Mathematica 22, 537-556. Wesolowski, J. (1993). Stochastic processes with linear conditional expectation and quadratic conditional variance. Prob. Math. Stat. 14, 33-44. Wesolowski, J. (1997). Martingale and related characterizations of the Poisson process. Bull. Internat. Stat. Inst. 57, 565-566. Wesolowski, J. (1997a). Martingale transformations of the Wiener and Poisson processes (preprint). Westcott, M. (1972). The probability generating functional. J. Australian Math. Soc. 14, 448~466. Wise, G. L. (1992). A counter example to a martingale characterization of a Wiener process. Stat. Prob. Lett. 15, 337-338. Ycart, B. (1988). A characteristic property of linear growth birth and death processes. Sankhya A 50, 18d~189. Ycart, B., W. A. Woyczynski, J. Szulga, J. A. Mann and D. A. Scherson (1986). Birth and death dynamics in adsorption. CWRU (preprint).
r)() z~ty
Associated Sequences and Related Inference Problems
B. L. S. P r a k a s a R a o a n d Isha D e w a n
The concept of association of random variables was introduced by Esary et al. (1967). In several situations, for example, in reliability and survival analysis, the random variables of lifetimes involved are not independent but are associated. Here we review recent results, both probabilistic and statistical inferential, for associated random variables.
I. Introduction
In the classical case of statistical inference, the observed random variables of interest are generally assumed to be independent and identically distributed. However in several real life situations, the random variables need not be independent. In reliability studies, there are structures in which the components share the load, so that failure of one component results in increased load on each of the remaining components. Minimal path structures of a coherent system having components in common behave in a 'similar' manner. Failure of a component will adversely effect the performance of all the minimal path structures containing it. In both the examples given above, the random variables of interest are not independent but are 'associated', a concept to be defined later. We give a review of probabilistic properties of associated sequences of random variables and their applications in the statistical inference for associated sequence of random variables. It is assumed that all the expectations involved in the following discussion exist. Hoeffding (1940) [cf. Lehmann (1966)] proved the following result. THEOREM 1.1. Let (X, Y) be a bivariate random vector such that E(X 2) < oc and E(Y 2) < oc. Then Cov(X, Y) = where, 693
J_ F
O0 O0
H(x,y)dxdy
(1.1)
694
B. L. S. Prakasa Rao and I. Dewan H ( x , y ) = P[X > x, Y > y] - P[X > x]P{Y > y] = P[X <_x, Y <_y] - P[X <
x f V _<y]
(1.2)
PROOF. Let (X1,Y1) and 0(2, Y2) be independent and identically distributed random variables. Then,
2[E(X1Y1) - E(X1)E(Y1)] = E{(X1 - Y2)(Y1 = E
-
Y2)]
f~f_~
OO
[I(u,X1) - I(u,X2)]
O0
[i(~, yl) - i ( ~ , Y2)]du d~

where I(u, a) = 1 if u < a and 0 otherwise. The result follows by taking the expectation under the integral sign. REMARK 1.2. Relation (1.1) is known as the Hoeffding identity. A generalized Hoeffding identity has been proved for multidimensional random vectors by Block and Fang (1988). Newman (1980) showed that for any two functions h(.) and g(') with E[h(X)] 2 < oo and E[g(Y)] 2 < oo and finite derivatives h'(-) and g'('), Cov(h(X), g(Y)) =
/?F
OO
h'(x)j(y)H(x,y)dxdy
(1.3)
O(~
Yu (1993) extended the relation (1.3) to even dimensional random vectors. Prakasa Rao (1998) further extended this identity following Queseda-Molina (1992). As a departure from independence, a bivariate notion of positive quadrant dependence was introduced by Lehmann (1966). DEFINITION 1.3. The pair (X, Y) is said to be positively quadrant dependent (PQD) if
P[X <_x,Y < y] > P[X < xJP[Y <_y] gx,y
(1.4)
(1.5)
or equivalently
H ( x , y ) > O, x,y c R.
It can be shown that the condition (1.5) is equivalent to the following: for any pair of non-decreasing functions h and g on R Cov(h(X), g(Y)) >- 0 . (1.6)
A stronger condition is that, for a pair of random variables (X, Y) and for any two real coordinate-wise non-decreasing functions h and g on R 2, Cov(h(X, Y), gO(, Y)) > 0 . (1.7) As a natural multivariate extension of (1.7), the following concept of association was introduced by Esary et al. (1967).
Associated sequences and related inference problems
695
DEFINITION 1.4. A collection of random variables {Xn, n _> 1} is said to be associated if for every n and for every choice of coordinate-wise non-decreasing functions h(x) and g(x_) from R n to R,
Cov(h(X_), g(x)) >_0

whenever it exists, where X = (Xl,... ,Xn).
(1.8)
1.1. Examples
It is easy to see that any set of independent random variables is associated [cf. Esary et al. (1967)]. Associated random variables arise in reliability, statistical mechanics, percolation theory etc. Some examples are as follows. (i) Let {X/',i> 1} be i.i.d, and Y be independent of {X/,i_> 1}. Then {X/= X / + Y, i _> 1} are associated. Thus, independent random variables subject to the same stress are associated [cf. Barlow and Proschan (1975)]. For an application to modelling dependent competing risks, see Bagai and Prakasa Rao (1992). (ii) Order statistics corresponding to a finite set of independent random variables are associated. (iii) Positively correlated normal random variables are associated [cf. Pitt (1982)]. (iv) Suppose a random vector (X1,...,Xm) has a multivariate exponential distribution F(Xl,... ,Xm) [cf. Marshall and Olkin (1967)] with
m
1 - F ( x l , . . . ,Xm) = exp - E ; t i x i i=1
E2ijmax(xi,xj)
i<j
- ~
i<j<k
2ijk max(xi,x],xk) . . . . xi > O, 1 < i < m
- 2~2.... max(xt,...,Xm)],
Then the components X1,... ,Xm are associated. (v) Let {Xl,... ,X,) be jointly a-stable random variables, 0 < ~ < 2. Then Lee etal. (1990) discussed necessary and sufficient conditions under which {X1,... ,Xn} are associated. (vi) Let {e~ : k . . . . , - 1 , 0 , 1,...} be a sequence of independent random variables with zero mean and unit variance. Let {wj : j = 0, 1,...} be a sequence of oo W oo non-negative real numbers such that ~j=0 J < oc. Define Xk = ~]=ow]ek-]. Then the sequence {Xk} is associated [Nagaraj and Reddy (1993)]. (vii) Let {Xk} be a stationary autoregressive process of order p given by Xt = ~blXt_l + . . . + OpXt p + et where {et} is a sequence of independent random variables with zero mean and unit variance. Then {Xk} is associated if dpi > O, 1 <_i<_p. Suppose p = 1 and 01 < 0. Then {X2z} and {X2k+l} are associated sequences [Nagaraj and Reddy (1993)].
696
B. L. S. Prakasa Rao and L Dewan
(viii) Consider the following network. Suppose customers arrive according to a Poisson process with rate 2 and all the customers enter the node 1 initially. Further suppose that the service times at the nodes are mutually independent and exponentially distributed and customers choose either the route rl : 1 ---, 2 -+ 3 or r2 : 1 --+ 3 according to a Bernoulli process with probability p of choosing rl. The arrival and the service processes are mutually independent. Let $1 and $3 be the sojourn times at the nodes 1 and 3 of a customer that follows route r~. Foley and Kiessler (1989) showed that S1 and $3 are associated. (ix) Let {Am} be a discrete time homogeneous Markov chain. Then {Xn} is said to be a monotone Markov chain if Pr[Xn+l _> ylX~ = x] is non-decreasing in x for each fixed y. Daley (1968) showed that a monotone Markov chain is associated. (x) Consider a system of k components 1 , . . . , k, all new at time 0 with lifelengths T1,..., irk. Arjas and Norros (1984) discussed a set of conditions under which the lifelengths are associated. (xi) Consider a system of N non-renewable components in parallel. Let denote the lifelength of a component i, i = 1 , . . . , N. Suppose the environment is represented by a real valued stochastic process Y = {Yt, t > 0} which is external to the failure mechanism. Assume that given Y, the lifelengths iri are independent and let lim ~Pr[t < T/_< t + r[~- > t,Y] =tli(t, Yt),
~ 0
i= 1 ...,N
where each ~li(t,y) is a positive continuous function of t > 0 and real y. Assume that rli(t,y) are all increasing (or decreasing ) in y. Further, let
Qi(ti) =
f0 'i rli(u, Yu)du,
ti >_ O, i = 1,... , N .
Thus Qi(ti) is the total risk incurred by the component i from the starting time to time ti. Then Lefevre and Milhaud (1990) showed that if Y is associated, then the lifelengths T I , . . . , T,, are associated as well as the random variables Q 1 ( t l ) , . , QN(tN) are associated. (xii) Let {Xi, i = 1 , . . . , n } be associated random variables and {Xi*,i = 1 , . . . , n} be independent random variables such that X[ and X/ have the same distribution for each i = 1,...,n. Let X(1) _<X(2) _<... _<X(~) and < * < < * . . X(~) Xi2) ... X (). n Let F(i)(x) and E () i (x) denote the &strlbutlon functions of X(i) and Xi~), respectively. Then Hu and Hu (1998) showed that
_ _ _
(F(1)(t),F(2)(t),... ,F(n)(t)) ~ ( ~ ) ( t ) , ~ ) ( t ) , . . . ,F(*)(t)),

and for a monotone function h
m
for allt C R ,
(Ek(X(1)),Eh(X(2)), . . ,Eh(X(n)) ) > (Eh(X~I)),Eh(X(*2));

]C ---
, ,
. , h(Xin)) )
E *
where, for a = (a~,..., an) and b = (b~,..., b,), we say b ~ a if E ~ I ai = 2 i = nI bi and ~i=1 a(i) >-- ~ki-I b(i) for k = 1 , . . . , n and a(1) <_ a(2) <_ ... <_ a(n) and b0) _< b(2) _< ... _< b(~) denote the ordered values of a~'s and b/s.
697
For example, in an animal genetic selection problem where X1,... ,X, are the phenotypes of animals, the best k of n animals (with scores (X(~-k+l),..., X(~))) are kept for breeding [Shaked and Tong (1985)]. The partial sum of }-~4=~ ~+1 (i) is the selection differential used by geneticists. If the X~'s have the same mean #, then the total expected gain of the genetic selection project, ~in_,_k+l X(i) is less significant for associated samples when compared with independent samples. For some other applications see Szekli (1995). REMARK 1.5. The concept of F K G inequalities, which is connected with statistical mechanics and percolation theory, is related to association. It started from the works of Harris (1960), Fortuin et al. (1971), Holley (1974), Preston (1974), Batty (1976), Kemperman (1977) and Newman (1983). For the relationship between the two concepts, see Karlin and Rinott (1980) and Newman (1984). They observed the following - a version of the F K G inequality is equivalent to
82
axia/lgf>-Oxj
foriCj
l_<i, j<_n
(1.9)
when f ( x l , . . . , xn), the joint density of X1,..., Xn, is strictly positive on R ". This is a sufficient but not a necessary condition for association of (X1,X2,... ,Xn). For example, if (X1, X2) is a bivariate normal vector whose covariance matrix ~ is not the inverse of a matrix with non-positive off diagonal entries, then (X1,X2) is associated [Pitt (1982)] but the density of (X1,X2) does not satisfy the condition (1.9).
1.2. Negative association

The concept of negative association as introduced by Joag-Dev and Proschan (1983) is not a dual of the theory and applications of (positive) association, but differs in several aspects. DEFINmON 1.6. Random variables X1,... ,Xn are said to be negatively associated (NA) if for every pair of disjoint subsets A1,A2 of { 1 , 2 , . . . , n} Cov(h(Xi, i E A1),9(Xj,j ~ A2)) _< 0 , whenever h and 9 are non-decreasing coordinate-wise. EXAMPLES. A set of independent random variables is negatively associated. Other examples of multivariate distributions that are negatively associated are (a) multinomial (b) multivariate hypergeometric (c) Dirichlet and (d) Dirichlet compound multinomial distributions. However, the most interesting case is that of models of categorical data analysis where negative association (NA) and (positive) association exist side by side. Consider a model where the individuals are classified according to two characteristics. Suppose the marginal totals are fixed. Then, the marginal distributions of row (column) vectors possess NA, and (1.10)
698
the marginal distribution of a set of cell frequencies such that no pair of cells is in the same row or in the same column (for example, the diagonal cells) are (positively) associated.
1.3. Weak association

Burton et al. (1986) defined weak association of r a n d o m vectors. DEFINITION 1.7. Let X~,X2,... ,Xm be Rd-valued r a n d o m vectors. They are said to be weakly associated if whenever ~r is a permutation of { 1 , 2 , . . . , m}, 1 < k < m, and h : R ~ --+ R, g : R(m-k)d ~ R are coordinate-wise nondecreasing, then
Cov(h(XTr(1),... ,G(k)), g(X~r(k+l),... ,X~(m)) ~ 0

whenever it is defined. An infinite family of Rd-valued random vectors is weakly associated if every finite subfamily is weakly associated. Weak association defines a strictly larger class of r a n d o m variables than does association. Burton et al. (1986) proved a functional central limit theorem for such sequences. Dabrowski and Dehling (1988) proved a Berry-Esseen theorem and a Functional Law of iterated logarithm for weakly associated sequences.
1.4. Processes with associated increments

Glasserman (1992) defined a class of processes with associated increments. A stochastic process X = {Xt, t > 0} is said to have independent increments if, for all n > 0 and all 0 _< to < tl < -.. < t~, the differences
z]0 =X0,
are independent.
A1 = X t l - X t o , . . . ,
Z~n =Xtn--Xtn_,
DEFINITION 1.8. A process {Xt, t _> 0} is said to have associated increments if their differences are associated, that is, for all bounded functions f and 9 nondecreasing component-wise, C o v V ( A 0 , . . . , A n ) , g ( A 0 , . . . , A o ) ] _> 0 , for all n > 0 and all 0 < to < tl < -.. < tn. Glasserman (1992) derives sufficient conditions under which process has associated increments and describes transformations under which this property is preserved. He derives sufficient conditions, for a M a r k o v process with a generator Q and initial distribution P0, to have associated increments. It is clear that all processes with independent increments have associated increments trivially from the definition of association. Suppose we consider a pure birth process where the birth rate is )~n when the population is of size n and further suppose that 2n is increasing in n and bounded with an arbitrary initial distribution on {0, 1,2,...}. Then this process has associated increments.
699
Let {Nt, t >_ 0} be a point process with intensity {/it, t _> 0}. For each t >_ 0, define Ht(s) = NtA,, 0 _< s < oo. Then Ht records the history {Ns, 0 < s < t} and H = {Ht, 0 _< t < oo} is a time-homogeneous Markov process on Dz[O, oo). Furthermore/it can be represented as 2t(Ht). Glasserman (1992) conjectures that the process {Nt, t > 0} has associated increments if 2t(.) is increasing and bounded. If {N~, t >_ 0} is a Poisson process, then {e N', t >_ 0} has associated increments. In general if {Xt, t _> 0} is nondecreasing and has associated increments and that 9 is nondecreasing and directionally convex, then {9(Xt),t>_ 0} has associated increments [cf. Glasserman (1992)]. For the definition of directional convexity of a function 9 : R d --+ R, d > 1 see Shaked and Shantikumar (1990). If d = 1, then this property coincides with usual convexity. One can extend the notion of a process with associated increments to processes with conditionally associated increments [cf. Glasserman (1992), pp. 329].
1.5. Associated measures

Burton and Waymire (1985) defined associated measures. A random measure X is associated iff the family of random variables Y = {X(B) :B a Borel set} is associated. They discussed some basic properties of associated measures. Lindqvist (1988) defined a notion of association of probability measures on partially ordered spaces and discussed its applications to stochastic processes with both discrete and continuous time parameter on partially ordered state spaces, and to mixtures of statistical experiments. Evans (1990) showed that each infinitely divisible random measure is associated. However, there are random measures which are not infinitely divisible but are associated. For instance if/i is a fixed radon measure and Y is a non-negative random variable, then it can be shown that the random measure Y# is associated.
1.6. Association in time

Hjort et al. (1985) considered a multistate system with states S -- {0, 1 , . . . , m}. Here m indicates perfect functioning and 0 indicates complete failure. Let C = { 1 , 2 , . . . , n} denote the set of components of the system. DEFINITION 1.9. The performance process of the ith component is a stochastic process {X~-(t),t E ~} where for each fixed t c ~,X/(t) denotes the state of component i at time t. The joint performance process of the components is given by
{x(t),t
Let
= {(xl(t),...
I = [tA,tB] C [O, oc), r([) = ~ A [
DEFINITION 1.10. The joint performance process {X(t), t c z} of the components is said to be associated in time interval I iff, for any integer m and { t l , . . . , tin} C r(I), the random variables in the array
700
Xl (tl) '''
B. L. S. Prakasa Rao and I. Dewan g l (gin)
Xn(tl) are associated
"'"
Xn(tm)
Let X = {X(t),t E ~} be a Markov process with state space {0, 1,...,m}. Define the transition probabilities as
Pij(s,t) = P ( X ( t ) -- j l X ( s ) = i), s < t ,
(1.11)
and the transition probability matrix as

P(s, t) = {Pij(s, t)},=o.~ ..........
j=O, l .. .,m.
(1.12)
Let
= (0, o o ) .
(1.13)
The transition intensity is defined as #u (s) Let

k v-j = P[X(t) >>j I X ( s ) k ~-~]~iv(S), v=j i] ,
lim Pu(s, s + h) h--,O + h
iCj
(1.14)
(1.15)
[~i>~(S)
i < j ,
(1.16)
and
j-1
~i,<j(S) ~I ~#iv(S), i ~j
.
(1.17)
v=0
THEOREM 1.1 1. Let X be a continuous time Markov process with state space {0, 1 , . . . , m } and transition probability matrix P ( s , t ) . Assume the transition intensities to be continuous. Consider the following statements about X: (i) X is associated in time, (ii) X is conditionally, stochastically, nondecreasing in time, that is,
P[X(t) > j I X ( s l ) =il,...,X(&) =i~ l
Associated sequences and related inJerence problems
701
il,...,i, for each j and for each choice of 1, (iii) Pi,>j(s, t) is nondecreasing in i for each j and for each s < t, (iv) for each j and s
is nondecreasing in
S 1 ~ . S 2 < "'" < S n < t~ n ~
#i,>j(s) is nondecreasing in i E {0, 1,... , j - 1}

and
#i,<j(s) is nondecreasing in i E { j , j + 1 , . . . , m} .
Then (ii), (iii) and (iv) are equivalent and each of them implies (i). For the binary case (m = 1) it is easily seen that the statement (iii) of the above theorem is equivalent to
P1j(s,t) +Po,o(s,t) >_ 1,
for e a c h s < t .
(1.18)
This was the sufficient condition given by Esary and Proschan (1970) for X to be associated in time. Furthermore when #1,0(s) and #0,1(s) are continuous, then the statement (iv) of the above theorem is always satisfied and the corresponding Markov Process is always associated in time. It is also true for a general birth and death process [cf. Keilson and Kester (1977) and Kirstein (1976)]. Kuber and Dharmadikari (1996) discussed association in time for semi-markov processes. Let (f2, ~ , ~ ) be a probability space and E = {0, 1 , . . . , k}. Define measurable functions )2, : f 2 - + E , T , : f 2 - + R + , n E N , so that 0 = To < T1 _< T2 _< -.. Then {(X,, T~),n > 1} is said to form a Markov renewal process with state space E if for all n E N , j E E, t C R +, we have
P[X,+I = j, Tn+l - T. <_ tlXo,... , & ; T o , . . . , T,]

----P[X~,+I = j, r,+l - T, <tlX.]
Assume that P[X,+I = j, T,+I - T, < tlX, = i] = Qij(t) is independent of n. DEFINITION 1.12. The semi Markov Process (SMP) {Y(t), t >_ 0} corresponding to a Markov renewal process ( X , T ) is defined as Y ( t ) = X , for t E IT,, T.+I], n >_ O. Define, for i , j C E and 0 s - u] ,

where Zi is the waiting time in state i. Suppose the transition intensity
#ij(u's) = h~0+lim Pij(u, (S, + h)) ,

is finite. Let
k
i Cj
(1.19)
vi,>/.,
(s, t)) =
v=j
(s,
t)) ,
(1.20)
702
B. L. S. Prakasa Rao and I. Dewan
#i,>_j(u,s) = ~ # i ~ ( u , s ) , v=j and

j-1
i %j ,
(1.21)
#i,~(s) = Z # i ~ ( u , s ) ,
v=0
i>j
(1.22)
THEOREM 1.13. Let {Y(t), t _> 0} be a semi Markov process with state space E, waiting times Zi having bounded transition intensities #ij(s),i,j E E which are continuous in s uniformly in u for each u > 0. Consider the following statements. (i) Y(t) is associated in time, (ii) P/,_>j(u, (s, t)) is increasing in i and decreasing in u on (0,s], (iii) For each j and s #i,>/(u, s) is increasing in i E {0, 1 , . . . , j - 1} and decreasing in u on (0,s], and #i,<j(u,s) is decreasing in i E { j , j + 1 , . . . , m } and increasing in u on (0,s], (iv) For fixed j and each choice of tl < . . . , tn < t, n _> 1 P[Y(t) > jIZ~o > tn - u, Y(u) = in, Y(tl) = i l , . . . , Y(tt) = it] is increasing in i l , . . . , in and decreasing in u E (te, tn], g E { 0 , . . . , n -- 1}. Then (ii), (iii) and (iv) are equivalent and each of them implies (i). If Y(t) is a Markov Process with state space E = { 0 , . . . , k}, the statements (ii) and (iii) in Theorem 1.13 simplify to statements (iii) and (iv) in Theorem 1.11. 1.7. Association for jointly stable laws Pitt (1982) proved the following result. THEOREM 1.14. [Pitt (1982)] Let X = 0(1,... ,Xi~) be Nk(O, S), Z = ((aij)) where Cov(X~,Xj) = aij. Then a necessary and sufficient condition for (X1,... ,Xk) to be associated is that crij >_ O, 1 <_ i , j <_ k. DEFINITION 1.15. [Weron (1984)] L e t X = (X1,..., Xk). Then X is said to be jointly stable with index c~, 0 < . < 2, if the characteristic function o f X is of the form ~b x (_t) ~ E [ei(XJ-t)] exp4-/
!, ,) Sk
I(s,_t)l~(1 -isgn((s,_t)O(ct;s,_t))F(ds) + i(_~,_t)~ ,

)
(1.23)
where t = ( t l , . . . , t k ) E R k , S k is the unit sphere in Rk,_s = (Sl,...,sk) C Sk, F is a finite measure on Sk, # = (#1,--., #k) E R k and
703
Q(e; s, t) =
tan _ ~ log I(s, t) l
if e e l , if e = 1 .
REMARK 1.16. There is a one to one correspondence between the distributions of jointly a-stable r a n d o m vectors X = (X1,... ,Xk) and the finite Borel measures F in (1.23). F is called the spectral measure of the a-stable vector X. THEOREM 1.17. [Lee et al. (1990)] Let X = ( X 1 , . . . ,Xk) be jointly e-stable with 0 < e < 2 with the characteristic function given by (1.23). Then 0 ( 1 , . . . ,Xk) is associated iff the spectral measure F satisfies the condition F(Sk ) = 0 , where Sk = {s = ( s l , . . . , s k ) (1.24) E Sk " for some i and j in { 1 , . . . , k } , s i > 0 and
sj <0}.
Let X = (X1,... ,Xk) be an infinitely divisible r a n d o m vector with the characteristic function
(1.25) Here v is called the Levy measure of X and # = ( # 1 , . . . , #k) E R k. Resnick (1988) has proved that a sufficient condition for (X1, ,Xk) to be associated is that
V{X = (Xl,... ,Xk) : NiXj < 0
for some i j ,
1 < i , j < k} = 0 .
(1.26)
In other words, the Levy measure is concentrated on the positive (Rk+) and the negative (R k_) quadrants of R k. The result given above in T h e o r e m 1.17 due to Lee et al. (1990) proves that for a-stable r a n d o m vectors, the condition (1.24) is necessary and sufficient for association. S a m o r o d n i t s k y (1995) showed that there is an infinitely divisible r a n d o m vector X taking values in R 2 which is associated with a Levy measure v such that
V{X = (Xl,X2) : XlX2 < 0} > 0
leading to the fact that condition (1.26) is not necessary for the association of an infinitely divisible r a n d o m vector. N o t e that i f X is infinitely divisible with the characteristic function ~bx(t), then, for every 7 > 0, ~x(t) ~ is also a characteristic function in R k. Let X~*~ be an infinitely divisible r a n d o m vector with this characteristic function. It is clear that X and X *1 have the same distribution. S a m o r o d n i t s k y (1995) proved that the condition (1.24) is equivalent to the fact X *~ is associated for every 7 > 0. An infinitely divisible r a n d o m vector X is said to be r-semistable index e, 0 < r < 1,0 < e < 2, if for every n >_ 1, there is a n o n r a n d o m vector _dn C R k such that
704 x , e , w=t / ~ x + 4
where X f Y indicates that X and Y have the same distribution. If X is r-semistable index ~ for all 0 < r < 1, then it is jointly stable with index ~ [cf. Chung et al. (1982)]. The following result extends Theorem 1.17 from stable to semistable random vectors. THEOREM 1.18. A random vector X = (X1,... ,Ark) which is r-semistable index e, 0 < r < 1,0 < ~ < 2 is associated if and only if its Levy measure is concentrated on < U
2. Some probabilistic properties for associated sequences

Esary et al. (1967) studied the fundamental properties of association. They showed that the association of random variables is preserved under some operations, for instance, (i) any subset of associated random variables is associated; (ii) union of two independent sets of associated random variables is a set of associated random variables; (iii) a set consisting of a single random variable is associated; (iv) nondecreasing functions of associated random variables are associated; and (v) if X~k),...,X (k) are associated for each k, and X (~) = (X~k),... ,X~k)) -+ X = (X1,... ,Xn) in distribution, then X1,... ,Xn are associated. Esary et al. (1967) have also developed a simple criterion for establishing association. Instead of checking the condition (1.8) for arbitrary nondecreasing functions h and g, one can restrict to nondecreasing test functions h and g which are binary or functions h and g which are nondecreasing, bounded and continuous. In addition they obtained bounds for the joint distribution function of associated random variables in terms of the joint distribution function of the components under independence. THEOREM 2.1. If X1,... ,Xn are associated random variables, then
P [ X i > xi, i = 1 , . . . , r / ]
~
I-IP[x, > Xi],

i=1
and P[Xi <xi, i = 1 , . . . , n ] >
HPfxi _<xi] .
i=l
t/
(2.1)
The concept of association is useful in the study of approximate independence. This follows from a basic distribution function inequality due to Lebowitz (1972). Define, for A and B, subsets of { 1 , 2 , . . . , n} and real xj's,
705
HA,B =P[Xj > x j ; j C A U B ] - P[Xk > x~, k E A]P[Xe > xe, g C B] . (2.2)
Observe that the function H(x,y) in (1.2) is a special case of this definition. THEOREM 2.2. [Lebowitz (1972)] If Xj-, 1 <_j < n, are associated, then
0 ~ HA, B ~_ Z Z H { i } , { j } iEA jEB
(2.3)
PROOF. Let Zi = I(Xi > xi). Define U ( A ) = H z , -,

iEA
and
V(A)= EZi
iEA
Then HA,B = Cov(U(A), U(B) ) , and Cov(V(A), V(B)) = Z Z H { i } , { j }

iEA jEB
Observe that V(A) - U(A) and V(B) are increasing functions of Zi, 1 0 . Similarly, V(B) - U(B) and U(A) are increasing functions of Zi, 1 0 .

Hence
Cov(U(A), U(B)) <_ Cov(U(A), V(B)) < Cov(V(A), V(B)) .

As an immediate consequence of the above theorem we have the following result. THEOREM 2.3. [Joag-Dev (1983), Newman (1984)] Suppose that X1,... ,Xn are associated. Then, {Xk, k E A} is independent of {Xj.,j E B} iff Cov(X~,Xj) = 0 for all k E A , j E B and Xj's are jointly independent iff Cov(Xk,Xj) = 0 for all k j, 1 <<k , j < n. Thus, uncorrelated associated random variables are independent. Another fundamental inequality which is useful in proving several probabilistic results involving associated random variables is given below. THEOREM 2.4. [Newman (1980)] Let (X, Y) be associated random variables with finite variance. Then, for any two differentiable functions h and 9,
706
]Cov(h(X), 9(Y)) I < suplh'(x)] suplg'(y)]Coy(X, Y)
(2.4)
where h ~ and g~ denote the derivatives of h and g, respectively. Proof is an immediate consequence of (1.3). Using the above inequality, we get ]Cov(exp(irX), exp(isY))[ < ]r IlslCov(x, Y) for - o c < r,s < oo. This leads to the following inequality for characteristic functions. THEOREM 2.5. [Newman and Wright (1981)] Suppose X1,...,X, are associated random variables with the joint and the marginal characteristic functions qS(rl,..., rn) and @(rj), 1 < j <_n, respectively. Then (2.5)
~(F1 ,...,rn) - HO;(r;) n j=l
<_ ~ jC~kZ Ir;llr
lCov(X;,&)
(2.6)
Theorem 2.5 gives an alternate proof to show that associated random variables which are uncorrelated are jointly independent. The covariance structure of an associated sequence {X,, n _> 1} plays a significant role in studying the probabilistic process of the associated sequence {X~, n _> 1}. Let
u(n) = sup Z
k> 1j:]j_k] >n
Cov(Xj,XI~),
n _> 0 .
(2.7)
Then, for any stationary associated sequence {Xj}, the sequence u(n) is given by
: 2 Cov(x ,x;)
j=n+ 1
The following inequality has been used by Matula (1996) and Dewan and Prakasa Rao (1999) for proving their results. THEOREM 2.6. [Bagai and Prakasa Rao (1991)] Suppose X and Y are associated random variables with bounded continuous densities fx and fy, respectively. Then there exists an absolute constant C > 0 such that sup pP[X < x, Y <_ y] - P[X <_xJP[Y < Y][
x~y
< C{max(supfx (x), supfy (x)) }2/3 (Cov(X, Y))1/3
(2.8)
2.1. Moment bounds

Birkel (1988a) observed that moment bounds for partial sums of associated sequences also depend on the rate of decrease of u(n).
707
THEOREM 2.7. [Brikel (1988a)] Let {X~,n >_ 1} be a sequence of associated r a n d o m variables with EXj- = 0 , j _> 1 and suppose that supEIXjl r+a < o c
j>l
for s o m e r > 2 ,
6>0
Assume that
U(rl) = O ( n -(r-2)(r+a)/26) .
Then, there is a constant B > 0 not depending on n such that for all n > l, sup EIS,,+m - S,,I r <_ Bn r/2
m>0
(2.9)
where S~ = ~ j = I X j . If the Xa.'s are uniformly bounded, then the following result holds. THEOkeM 2.8. [Birkel (1988a)] Let {X,, n > 1} be a sequence of associated rand o m variables satisfying EXj = 0 and ]X/] -< c < oo for j _> 1. Assume that
//
u(n) = O(n-(r-2)/2)
Then (2.9) holds. The above result can easily be generalized to obtain the following result by m e t h o d s in Birkel (1988a). THEOREM 2.9. [Bagai and P r a k a s a R a o (1991)] F o r every ~ C I, an index set, let {Xn(~),n >_ 1} be an associated sequence with E X , ( ~ ) = 0 and sup suplX,,(a)l <_ A < oc .
c~EI n>_l
Let
n j=l
and
u(n,c~) = sup
Cov(X].(~),Xk(c~)) .
k>l j:]j kl>_ n
Suppose there exists b > 0, independent of c~ c I and n >_ 1, such that for some r > 2 , and all ~ E l and n _> l,
u(n, o0 < bn -(r-2)/2
Then, there exists a constant C, not depending on n and ~, such that for all n > 1, sup sup EIS,+m(C~ ) - S,(~)I r < Cn r/2 .
a c l m>0
708
The above result is useful in the nonparametric estimation of the survival function for associated random variables [cf. Bagai and Prakasa Rao (1991)]. Bagai and Prakasa Rao (1995) generalized Theorems 2.7 and 2.8 to functions of associated random variables when the functions are of bounded variation and then used it for nonparametric density estimation for associated sequences. THEOREM 2.10. [Bagai and Prakasa Rao (1995)] For every ~ E J, an index set, let {Xj.(~),j > l} be an associated sequence. Let f~,n > 1 be functions of bounded variation which are differentiable and suppose that snp,_>isupx]f~(x)l < c < oc. Let E(f~(Xj(e))) = 0 for everyn _> 1,j _> I a n d a E J. Suppose there exist r > 2 and 6 > 0 (independent of e, j and n) such that sup sup sup EIf~ (Xj(c@ I~+~ < oc .
n>l cz6J j>_l
(2.10)
Let
u(n,e) = sup Z
k> 1j:lj_kl >n
Cov(Xj(c~),Xk(e)) .
(2.11)
Suppose that there exists c > 0 independent of a c J such that
u(n, ~) < cn -(~-2)(~+~)/2~

Then there exists a constant B not depending on n, m and c~, such that sup sup sup E[Sn+k,m (~) -- Sk,m (0~)1r ~ Bfl r/2 m>l ~cJ k_>0 where
mn
=
(2.12)
j=l
T~EOREM 2.11. [Bagai and Prakasa Rao (1995)] For any e E J, an index set, let {Xj(c~),j _> 1} be an associated sequence. Let fn,n _> 1 be functions of bounded variation which are differentiable and suppose that supn> a SUpxlf~(x)l < oc, and sups> 1 supxl(x)l _< c < oc. Let E(f~(Xj.(c@) = 0, n > 1, c~ E J a n d j > 1. Assume that there exists r > 2 such that
u(n, c~) = O(n -(~ 2//2)
Then there exists a constant B not depending on n, m and c~, such that sup sup supEIS.+k,m(~)
m>l eEJ k>0
- Sk,~(c~)] r ~< Bn r/2
(2.13)
where u(n, ct) is defined by (2.11).
709
Bulinskii (1993) generalized Birkel's results to random fields of associated variables. Recently, Shao and Yu (1996) obtained some Rosenthal-type moment inequalities for associate sequences useful in their study of empirical processes for associated sequences. THEOREM 2.12. Let 2 1} be a sequence of associated random variables with Ef(X~) = 0 and IIf(x.)L = (EIf(X,)]r) 1/~ < ~ . Let
u(n)=sup
Suppose that
Cov(X/,Xj) < o c ,
n>_O .
i>_l j:lj-il>_n
u(n) <_ Cn o
(2.14)
for some C > 0 and 0 > 0. Then, for any e > 0, there exists K = K(g, r,p, O) < oo such that
K nl+~maxEIf(X~)l p
n m a x ~ - " ICov(f(X/),f(Xj))l
i<n ~=1
q_ n(r(p-1)-p+O(p-r)/(r-2)V(l+ e)
\
max JJf(X/)H~(p 2)/(~-2)(BzC)(~-P)/(r-=)]

i<n
(2.15)
2.2. Bernstein-type inequality

Prakasa Rao (1993) obtained a Bernstein-type inequality applicable for sums of finite sequences of associated random variables. TUEOREM 2.13. For every n >_ 1, let Z}n), 1 < i _< In be associated random variables such that E(Z} n)) = 0, 0(~)(m) = Let
In
IZ~nlI < 4 <_ o~. Define

Z Cv(Z~n)' Z(n)~k , . (2.16)
sup
l <il ,i2<ln il <J,k 0, P(IS~I > e) < 2exp{6c~2l.C~ + 3x/~e 2l~m22C2 ~ - ~ } where (2.20) (2.19) (2.18)
2.3. Strong law of large numbers Strong law of large numbers for associated sequences have been obtained by Newman (1984) and Birkel (1989), former for the stationary case and the latter for the nonstationary case. THEOREM 2.14. [Newman (1984)] Let {Xn,n > 1} be a stationary sequence of associated random variables. If
n
;2_
Cov(X,,Xj)
0 -
a s n - + oo ,
then ~ ( & - E ( S n ) ) --+ 0 a.s. as n - + oo . (2.21)
THEOREM 2.15. [Birkel (1989)] Let {Xn,n >_ 1} be a sequence of associated random variables with finite variance. Assume that j_~l~2 Cv(Xj, SJ) < c Then (2.21) holds. Theorem 2.15 has been generalized to functions of associated random variables. THEOREM 2.16. [Bagai and Prakasa Rao (1995)] Let {Xn,n > 1} be a stationary sequence of associated random vaiables. Let &,n = ~ _ T f . ( X j ) where f , is differentiable with SUpx If,',(x)l < c. Suppose E[fn(X1)] = 0, Var[f,(X1)l < oc and
oo
Z Cov(Xi,Xj) j=l
_< e < oo .
711
Then,
SB/7
' ?/
a.s.
asn~oc
REMARK 2.17. Theorem 2.16 can be used to prove the pointwise strong consitency of kernel type nonparametric density estimators for density estimation for associated sequences [cf. Bagai and Prakasa Rao (1995)]. Strong law of large numbers for a triangular array of associated random variables is useful in nonparametric density estimation. THEOREM 2.18. [Dewan and Prakasa Rao (1997a)] Let {X,j, 1 <_j < G,n >_ 1} be a triangular array of strictly stationary associated random variables with E[X~I] = 0 and Var(X,1)< oc for all n. Suppose that k, = O(n~) for some 0 < 7 < 3 and the following condition holds. G ~ C o v ( X , 1 , X , j ) < oc . (2.22)
j=l
Let S~,l = ~ j = 1 I X , j. Further suppose that EI max ~lSj,kj- S,2k;] = O(n 4-6) (2.23)
L.2<j<(.+I)
for some ~5> 1. Then, S,.k,, - ~ 0

/7
a.s.
asn--+oc .
(2.24)
2.4. Central limit theorems

The next theorem was the original application (and motivation) of the characteristic function inequality given in Theorem 2.5. It gives the central limit theorem for partial sums of stationary associated random variables. THEOREM 2.19. [Newman (1980, 1984)] Let {X~, n _> 1} be a stationary associated sequence of random variables with E[X2] < oc and 0 < 0.2 = V(X1) + 2 ~ =ee 2 Cov(X1,Xj) < oo. Then, n-1/2(S, - E(S,)) ~ N(0, 0-2) as n ~ oc. Since, 0. is not known in practice and needs to be estimated. Peligrad and Suresh (1995) obtained a consistent estimator of a. Let {I,,n > 1} be a sequence of positive integers with 1 _< In _< n. Set Sj(k) = Z-,i=j+v'J+k I X,i,B~ = ~-I Lz_.,j=0rV'" IsA0-1"S"lq,/7 -I J' THEOREM 2.20. Let {X,,n > 1} be a stationary associated sequence of random variables satisfying E(XI) = #,E(X 2) < oc. Let l~ = o(n) as n --+ oc. Assume that ~ - 2 Coy(X1,32/) < ec. Then
712
B. L. S. Prakasa Rao and L Dewan Bn ---+ crv/2/rc in
L2
a s n ---+ o o
In addition, if we assume that l, = O(n/(logn) 2) as n ~ ec, the convergence above is almost sure. A local limit theorem of the type due to Shepp (1964) was proved for stationary associated sequences by Wood (1985). Cox and Grimmett (1984) proved a Central limit theorem for double sequences and used it in percolation theory and voter model. THEOREM 2.21. [Cox and Grimmett (1984)] Let {X~j, 1 < _ j < n , n >_ 1} be a triangular array of associated random variables satisfying: (i) there are strictly positive, finite constants cl, c2 such that Var(X~j) _> Cl,E[lXnj] 3] < c2 V j and n . (ii) there is a function u : {0, 1,2,...} -+ N such that u(r) --+ 0 as r -+ oc and Cov(X,j,X,k) < u(r)
j:lk-jl>r
for all k,n and r > 0 .
Let S~,~ = ~ = l X ~ j . Then the sequence {S~,~,n _> 1} satisfies the central limit theorem. REMARK 2.22. Roussas (1994) established asymptotic normality of random fields of positively (as well as negatively) associated processes.
2.5. Berry-Esseen type bound

The next, natural question, is the rate of convergence in the central limit theorem. We have the following versions of the Berry-Esseen theorem for associated sequences. Hereafter we denote the standard normal distribution function by ~b(x). THEOREM 2.23. [Wood (1983)] Suppose {X~,n >_ 1} is a stationary sequence of associated random variables satisfying E[Xn] = O, E(X 2) < oc,E[[Xnl 3] n and 0 < a 2 = V(X1) + 2E~x~2Cov(X1 ,Xj) < O(3. Then, for n = m . k 16~m(G 2 - #~) where S~ = 2i=1 ~ , % = E(S~), pn =
n X~. -2 2 < oo
for all
3Pk and F~(x) = P[S, <_x].
e[Ix.?]
However, the rate obtained above by Wood (1983) at its best is of O(n -I/5) which is far from the optimal Berry-Esseen rate O(n -1/2) in the classical Berry-Esseen bound for sums of independent and identically distributed random variables. An improvement of the same is given below. THEOREM 2.24. [Birkel (1988b)] Let {X,,n >_ 1} be an associated sequence with
E[Xn] = O, satisfying
713
(i) u(n) = O(e-;n), (ii) l n f = > 0,

' 0"2 2__
2 > 0,
2
n>l n n>_l
G - E[S,I, and
(iii) supE[IX~] 3] < oo, where u(n) is as defined by (2.7). Then there exists a constant B not depending on n such that for all n _> 1, An =- suplP{a21S,,
xER
< x} ~b(x)[
< Bn 1/210g2n
If, instead of (iii), we assume that

_l
for some
6 > 0
then there exists a constant B not depending on n such that for all n _> l,
An <_Bn-1/Zlogn
Even though Birkel (1988b) obtained an improved rate of O(n 1/21og2n), yet it is not clear how the constant B involved in the bound depends on the moments of the random variables {Xn}. The following result is an attempt in this direction. THEOREM 2.25. [Dewan and Prakasa Rao (1997b)] Let {X/, 1 0 and E[IXll3] < oo. Suppose the distribution of X1 is absolutely continuous. Let n 2 --* a~ as n ~ oc. Let E~(x) be S~ = ~i=lX/ and G2 = Var(Sn). Suppose that G/n the distribution function of Sn/a~ and F~*(.) be the distribution function of ~';~=1 Z~/an where Z~, 1 0,1 <i<3, suchthat
d1/3m2/3
suplF~(x)x - q ~ ( x ) ] _ < B l - - ~ where

n
+B2~+B3
EIXll 3
( an
~
(2.25)
do =
1)Cov(Xl,x:)
j=2
REMARK 2.26. The bound given above can be made more explicit by bounding mn in (2.25) if we assume that the characteristic function of X1 is absolutely integrable. Then for large n
714
B. L. S. P r a k a s a Rao and 1. D e w a n
rn,<
( -a~-x/~ o-o 1) SUxpg, (o-,,x) + s u p / o-.x

1
Bulinskii (1995) established the rate of convergence of standardized sums of associated r a n d o m variables to the n o r m a l law for a r a n d o m field of associated r a n d o m variables.
2.6. Invariance principle

Let {Xn, n _> 1} be a sequence of r a n d o m variables with EXn = O, E X 2 < oc, n > 1. Let
//
S0=0,
Sn=
~&
k=l
2 2 , o-n=ES~,
n>l
Assume that %2 > 0, n __ 1. Let {k,,n >_ 0} be an increasing sequence of real n u m b e r such that
O=ko <k~ <k2 <-..
(2.27)
and
n-+oa l <i<n
lira m a x ( k / - ki-1)/k~ = 0 .
(2.28)
Define m(t) = max{i : ki <_ t}, t >_ O, and
W~(t) = Sm(O/o-,,
t E [O, 1], n _ > l ,
(2.29)
where mn(t) = m(tkn). Consider the process

w,,*(t) - sI"'l ,
0"n
t ~ E0,1] .
(2.30)
When kn = n, n > 1 the process defined by (2.29) and (2.30) are equivalent. THEOREM 2.27 [ N e w m a n and Wright (1981)]. Let {Xn,n >_ 1} be a strictly stationary sequence of associated r a n d o m variables with E_X1 = 0 and EX~ < oc. I f 0 < o-2 Var(X~) + 2 ~ = 2 C o v ( X 1 , X n ) < oc, then, W~* ~ W as n --+ ec, where W is a standard Wiener process. An invariance principle for nonstationary associated process has been studied by Birkel (1988c). THEOREM 2.28 [Birkel (1988c)]. Let {Xn,n > 1} be a sequence of associated r a n d o m variables with E(Xn) = O,E(X 2) < oc for n _> 1. Assume (i) ,!im~r22E(Ur, kUng) = min(k, g) for k, g _> 1 and Urn,, = Sin+, - Sin.
715
(ii) Crn2(gn+m - S n ) 2 , m _> 0,n > 1 is uniformly integrable. Then W]~W asn--+oc .
Birkel's result was generalized by Matula and Rychlik (1990). They observed that if W,* ~ W as n ---+ec, then 2 (7n = nh(n) , (2.31)
where h : R + --+ R + is slowly varying. They proved an invariance principle for sequences {X~, n _> 1} which do not satisfy the condition (2.31). THEOREM 2.29 [Matula and Rychlik (1990)]. Let {Xn,n >_ 1} be a sequence of associated random variables with E(X~) = 0, E(X~) < oc for n _> 1. Let {k,,, n _> 1} be a sequence of real numbers satisfying (2.23) and (2.24). Assume that (i) lirna22E(S~.(p)Sm,(q)) = m i n ( p , q ) , (ii) (a~+ 2 m - am) 2 - 1 (S~+m - S~)2, Then
Wn ~ W asn--+oo .
forp, q > 1,
m >_ 0, n > 1 is uniformly integrable.
2.7. Strong invariance principle Yu (1996) proved a strong invariance principle for associated sequences. We now briefly discuss this result. Let {Xn, n >_ 1} be an associated sequence with EX, = 0 and define u(n)=sup
k>l
~
j:[j-kl>n
Cov(Xj,X/~) .
Define blocks Hk and Ik of consecutive positive integers leaving no gaps between the blocks. The order is HI,I~,H2,I2,.... The lengths of the blocks are defined by card{Hk} = [U], card{I~} = Ik~]
for some suitably chosen real numbers ~ >/~ > 0 with card{K} denoting the number of integers in K. Let
icHk
and
= = 2
iEIl~
716
for k > 1. Let {V~,k _> 1} be a sequence of independent N(0, z2/2) distributed random variables independent of {Uk, k >_ 1}. Define
(
~=(uk+~4)/ ~+
-~/ 1/2
, k>l.
Let Fk be the distribution function of ~k. Note that the function Fk is continuous. Let
~k = ~-1(F~(~)),
k _> 1 ,
where 4)-1 denotes the inverse of the standard normal distribution function ~. Note that each t/k has a standard normal distribution and the sequence {qk, k _> 1 } is an associated sequence. Further more the covariances of the sequences {t/k,k > 1} are controlled by the sequence {X~,n >_ 1}. The following theorem is due to Yu (1996). THEOREM 2.30. Let {Xn,n >_ l} be an associated sequence satisfying EX~ = O, inf~_>l E X ff > 0, and sup IXnl2+r+6 < oc
n_>l
(2.32)
for some r, c5 > 0. Further suppose that
u(n) = O(n-7),
7--
4 2 + r + 6) 2~5 > 1 .
(2.33)
If moreover 513/3 > c~ >/3 > 0, then for any 0 < 0 < (1/2) and for all i C j,
0 ~ Etlirlj ~ C((ij)-c~/2E(gigj)) 0/(1+0)
(2.34)
for some constant C not depending on i,j. Further more there exists real numbers c~ > fl > 1 and some e > 0 such that, for k satisfying Ark < N <_Nk+l,
N
j-1
Xj -- Z
k (2 ~'C2) 1/2 I ~ui -~-@ qi ~-- C1N(1/2)-e i=1 \
a.s. ,
(2.35)
for some constant C1 not depending on N. Based on this theorem, Yu (1996) established the following strong invariance principle for associated sequences. THEOREM 2.31. Let {X,,n >_ 1} be an associated sequence satisfying EX~ = O, inf~>l E X ff > 0, and sup IXnl2--r+6 < oo
n>l
(2.36)
for some r, 3 > 0. Further suppose that
717
u(n) = O(e -;~n)
(2.37)
for some 2 > 0. Then without changing its distribution we can redefine the sequence {Xn, n > 1} on another probability space with a standard Wiener process {W(t),t >_ 0} such that, for some e > 0,
N
ZXj
j=l
- W(a2N) = O(N (1/2)-~) a.s. ,
(2.38)
where cr~ =
Var(EN=lx;).
L.8loglog n] '/~ ---sup IS~t = 1
7r'2(72 J l<i<<_?/
As a special case of the above theorem, it follows that l~f a.s. , (2.39)
n where S, = ~i=1 if/, under the conditions stated above.
2.8. Law of iterated logarithm

Dabrowski (1985) proved the law of iterated logarithm for associated sequences. THEOREM 2.32 [Dabrowski (1985)]. Let {X,,n >_ 1} be a sequence of strictly stationary associated random variables with E(X1) = O, ~2 = 1, sup(EISk/kl/2[ 3 : k>_ 1 ) < ~ , a n d
1 -E(S2/n) = O(n -~)
for some 6 > 0 .
(2.40)
Then {AT,,,n _> 1} satisfies the functional law of iterated logarithm, that is, let
Z?/(t) =- ~ Sk(2nlglgn)-l/2
t linear otherwise
i f 0 < k < n and t = k/n, for 0 < t < 1 .
Then, with probability one, Z?/is equicontinuous and the set of its limit points (in the supnorm on C[0, 1]) coincides with the set {k c C[0, 1] :k is absolutely continuous in [0, 1], k(0) = 0 and fl(k(t))2 at < 1}. In particular P [ lira sup(S?//v/2nloglogn)=
k ?/--+0(3
11 = 1 .
J
2.9. Demimartingales
Newman and Wright (1982) introduced the notion of demimartingales and proved some submartingale-type inequalities for associated random variables. DEFINITION 2.33. A sequence of random variables S1,$2,... in L 1 is called a demimartingale if for j = 1,2,..., and all coordinate-wise nondecreasing
718
functions g
E((Sj+I - Sj)g(S1,... ,Sj)) > 0 ,

provided the expectation is defined.
(2.41)
REMARK 2.34. If X1,X2,..., are L 1, mean zero associated random variables and 8j =X1 + . . . +Xj. with So = 0, then $I,$2,... is a demimartingale. If (2.30) is modified so that g is required to be nonnegative (resp., nonpositive) and nondecreasing, then the sequence will be called a demisubmartingale (resp., demisupermartingale). Let S* = m a x ( S 1 , . . . , & ) . Define the rank orders R,j by f j t h largest of (81,..., S~), if j_< n, R,.j- = /. min(Sl,... ,&) = R~,~, ifj>n . THEOREM 2.35. Suppose $1,82,. is a demimartingale (resp., demisubmartingale) and m is a nondecreasing (resp., nonnegative and nondecreasing) function on (-0% oc) with m(0) = 0; then for any n and j, (2.42)
(/2)
udm(u)
<_ E(Snm(Rn,j)) ;
(2.43)
and thus for any 2 > 0,
2P(R,j >_ 2) <_ {R,j >_ 2}S~ dP .

THEOREM 2.36. If S1,82,... is an L 2 demimartingale, then
(2.44)
E((R,,j - SO)2) < E(S~) . THEOREM 2.37. Suppose $1,82,... is a demimartingale. Then 2
a n d f o r 21 < 22,
(2.45)
(2.46)
(1 - a]/(2z - 21)2)p(s~ _> 22) _< P(S~ _> 2i) , so that for ~1 < c~2 with e2 - cq > 1, P(max(lS1F,...,l&r) > ~2o~) <
--- (~2 - - ~ 1 )- ~
(2.47)
( ~ 2 - 0~1)2
1P(ISnl ~ "10"n) "
(2.48)
Associated sequences and related inferenceproblems
719
THEOREM 2.38. IfS1,S2,... is an L 2 demisubmartingale, then for 0 < )~1 < 22,
P(S2 > 22) _< (an/(22 - )q))(P(Sn
>
)q))l/2
.
<
(2.49) ~2,
>_ (ZlO'n)) 1/2
If $1,$2,... is an L 2 demimartingale, then for 0 _< ~1
P(max(ISll,..., IS l) _>
_<
-~l)-~(P(IS~l
(2.50) The next theorem extends D o o b ' s upcrossing inequality to demisubmartingales. Given $1,$2,...,S~, and a < b we define a sequence of stopping times J0 = 0 , J 1 , J 2 , . . . as follows (for k = 1 , 2 , . . . ) : Jzk-1 = n + 1 if {j : J2k-2 < j ~ n and Sj <_ a} is empty, min{j : J2k-2 < j _< n and Sj <_ a}, otherwise . n+l if{j:Jzk_z<j_<nandSj>_b}isempty, min{j : J2k-2 < j <_ n and Sj _> b}, otherwise .
J2k-1 --
The n u m b e r of complete upcrossings of the interval [a,b] by S 1 , . . . , S , is denoted by Ua,b where Ua,b = max{k : J2k < n + 1} . Tr~EOREM 2.39. If $1, $ 2 , . . . , S~ is a demisubmartingale, then for a < b, (2.51)
E(Ua b) < E((Sn - a) +) - E((S1 - a) +)

' b a
(2.52)
The following theorem is an immediate consequence of T h e o r e m 2.39. TI-IEOREM 2.40. If {Sn} is a demimartingale and supnElSnl < oc, then S,, converges a.s. to a finite limit.
3.
Random
generation
of associated
sequences
Matula (1996) discusses two examples which can be used for generating associated sequences. (i) Let {Y,, n >_ 1} be a sequence of independent and identically distributed standard normal r a n d o m variables. Let Xn = I ( . . . . ) (_Y1 + Y 2 + ' " + \ Y~.~ for arbitrary u E R j
Then, {Xn, n > 1 } is a sequence of associated r a n d o m variables with
720
B. L. S. Prakasa Rao and I. Dewan l fu { X27
for j < n , where denotes the standard normal distribution function. (ii) Let {Y,,n >_ 1} be a sequence of independent and identically distributed random variables with E(Y1) = 1 and E(Y2) = 2. For n _> l, let
1 1
X n = 2-~TT_1Y 1 +..'-L2-7~_IYn_ 1 +
nY,
Then, {X~,n _> 1} is a sequence of associated random variables with Cov(Xj,X~) = ~ 1 f o r j < n. j-1 ( j + ~c5- )
4. Statistical inference for associated sequences
As seen in the previous section, some probabilistic aspects of associated random variables have been extensively discussed in the recent years. However, very little work has been done as far as related statistical inference problems are concerned. We now present some recent results in this direction.
4.1. Nonparametric estimation for survival function

Let {X,,n >_ 1} be a stationary sequence of associated random variables with distribution function F(x), or equivalently, survival function iV(x) = 1 - F ( x ) , and density function f(x). The empirical survival function F~ (x) is defined by
It
F,(x) = ~ ~ Y j ( x ) ,
j=l
(4. l)
where Yj(x) = 1 0, ifXj > x otherwise (4.2)
It is interesting to note that for fixed x, Yj's are increasing in Xj's and hence are associated. Bagai and Prakasa Rao (1991) proposed F,(x) as an estimator for F(x) and discussed its asymptotic properties. THEOREM 4.1. [Bagai and Prakasa Rao (1991)] Let {X,,n >_ 1} be a stationary sequence of associated random variables with bounded continuous density for XI. Assume that, for some r > 1,
721
{Cv(Xl,XJ)} j=n+l
1/3 =
O(n-(r-1))
(4.3)
Then,
F,(x) --+ F(x)
a.s.
as n --+ oc .
THEOREM 4.2. Let {X,,n >_ 1} be a stationary sequence of associated random variables with one-dimensional survival function F(x). Let
n
7n = Z C v ( X 1 , X j )
j--1
(4.4)
Then, for any e > 0,

2
P[[/>,(x) - .P(x)l > eI < e~-x'~ + n for any sequence {2~} such that )~ < n/4. PROOF. For all 2n > 0,
1/3--2 (2--e)2n
,~ne
7,1/3
(4.5)
E[e~(F=(x)-F(x))] : E [e@~;=,(~(x)-eO (x))] = E e~ ~J='(~(x)-e~(~)) _

j= n
E e~ (~(x)-e~(x))
(4.6)
j=l
Note that for large n 0 < 2= < ~ and
(Yj(x) - EYj(x)) <_ -~ .
Furthermore, e" _< 1 + u + u 2 for tu[ <_ . Hence

n j=l
< HE
j--1
1 +--(Yj(x) -eYj(x))+
I_ r/
~(Yj(x) - eYj(x)) 2
<jl~ _ t ( l + 2 : ' n2 ~ _j
e 7- .
(sinceV(Yj(x))<l) (4.7)
722
Using Newman's (1980) inequality and Sadikova's (1966) lemma for T > 0, we get that E [e@~=l (~(x)-e~(x))]--j=llIEe@(YJ(X)-Egj(x))][
,~2 2)~ -< ne //; Z

1<i<j<n
Cov( (x),
{P(Xi > x, Xj > x) - P(Xi > x)P(Xj > x)} / / ( n - 1)}
~2 22n Ane /12
Z
1<_i<j<n
-<- ne 7 - ~.
_ z~e //2 < =
n
.52 23, { T2
Z
1<i<j<n
Cov(X,,x:) +
T 2 ~ ' ( / / - j)Cov(XI,Xj) +
j=l
T2y + (choosing T
=
//
1/3)~2e2L'
nl/37ffl/3 ) .
(4.8)
Using Markov's inequality for any e > 0 and combining (3.6)-(3.8), we get that 2 (2-~)~,7~ 1/3 P[/>~(x) - ie(x) > e] _< e~ -~~ + n -1/3 2~e (4.9)
The result follows from the fact that if Yj(x) are associated, for fixed x then so are - ~ ( x ) . REMARK 4.3. Suppose yn = O(n-S(log//)-6). Choose 2~ = logn. It is easy to see that
e4-~'~ = O(n -~)

and
(4.10)
n-1/32~e2 (2-e)).nTn1/3 = O(//-e) .

Hence P[IPn(x) -.P(x)[ > ] = O(n ~) .
(4.11)
(4.12)
Glivenko-Cantelli type theorem for associated random variables is given below. THEOREM 4.4. [Bagai and Prakasa Rao (1991)] Let {Xn,n >>_1} be a stationary sequence of associated random variables satisfying the conditions of Theorem 4.1. Then for any compact subset J c R
723
sup[lFn(x)-F(x)l : x e J I - - ' O
a.s.
asn-+cc
THEOREM 4.5. [Yu (1993)] Let {X,} be a sequence of associated random variables having the same marginal distribution function F(x) for Xn, n _> 1. If F(x) is continuous and 1 n~l~COv(Xn,&_l) < oc , then, as n --+ oc, sup
--oo<x<oo
(4.13)
]F,(x) - F ( x ) ] - + 0
a.s.
(4.14)
If the sequence in above theorem is stationary, then the condition (4.13) can be weakened to
nZ
/l"z
Cov(X,,Xi) --+ 0 .
(4.15)
From the central limit theorem for stationary associated random variables it follows that THEOREM 4.6. [Bagai and Prakasa Rao (1991)] Let {Xn,n > 1} be a stationary associated sequence of random variables with bounded continuous density for X1 and survival function/~(x). Suppose that
oo
}2{Cov(X,,x,)}'/3 <
j=2
Define,
oo
O-2(X) =
F(x)[1 - ,P(x)] + 2 ~ { P [ X I > x,Xj > x] - F2(x)} .

j=2
Then, for all x such that 0 < F(x) < 1, nl/2(Fn(x) - F(x))/(x) ~ gO(x) as
n --~ OO.
Consider the following empirical process
fin(x) = n*/2(Fn(x) - F(x)),
x ER .
(4.16)
Weak convergence for empirical processes has been discussed by Yu (1993). THEOREM 4.7. [Yu (1993)] Let {Xn, n > 1} be a stationary associated sequence of random variables with bounded density for )(1 and suppose that there exists a positive constant v such that
oc
r/T4- C o V ( X l , Y n ) ( oo . n=l
13
(4.17)
724
Then fin(') ~ B(F(.))
in D[0, 1] ,
(4.18)
where B is a zero-mean Gaussian process on [0, 1] with covariance defined by

OO
EB(s)B(t) = s A t - st + 2 ~-~,(P(X1 <_ s,Xk < t) - st) k=2 with P{B(.) E C[0, 1]} = 1.
(4.19)
The above result has been improved by Oliveira and Suquet (1995) and more recently by Shao and Yu (1996). THEOREM 4.8. [Oliveira and Suquet (1995)] Let {Xn} be strictly stationary associated random variables with continuous distribution. Let 7(s, t) be defined as 7(s, t) = F(s A t) - F(s)F(t)
O0
+ 2 Z I P ( X 1 < s,Xk < t) - P(X1 <_s)P(Xk < t)] . k=2
(4.20)
Suppose the series in (4.20) converges uniformly on [0, 1]2. Then the empirical process fin converges weakly on L2(0, 1) to a centered Gaussian process with covariance function 7(s, t). In the case of uniform variables X~, the uniform convergence of the series in (4.20) follows from the fact that
Z Cv1/3(X1 ' X n ) n>_2
< O0 .
(4.21)
4.2. Nonparametric density estimation Density estimation in the classical i.i.d, case has been extensively discussed. For a comprehensive survey, see Prakasa Rao (1983). These results have been extended to estimation of marginal density for stationary processes which are either Markov or mixing in some sense [cf. Prakasa Rao (1983)]. As has been pointed out earlier in this paper, it is of interest to study density estimation when the observations form an associated sequence, for instance, in the context of lifetimes of components in reliability. Bagai and Prakasa Rao (1995) proposed a kerneltype estimator for the unknown density function for X1, where {Xn,n >_ 1} is a stationary sequence of associated random variables. Assume that the support of f is a closed interval I = [a, b] on the real line. Consider fn(x) = K\--5~/ , x CI (4.22)
725
as an estimator for f(x), where K(-) is a suitable kernel and h~ is a bandwidth sequence. The asymptotic behaviour offn (x) is discussed later under the assumptions (A) listed below: (A1) K(-) is a bounded density function and of bounded variation on R satisfying (i) liml~,l_,~lulK(u ) = O, (ii) f_~ u2K(u)du < oc. (A2) K(x) is differentiable and sup~lK'(x)l <<c < oc. REMARK 4.9. Note that standard normal density satisfies the above conditions. In addition to (A1) and (A2), it is assumed that the covariance structure of {Am} satisfies the following condition. (B) For all g and r,~j:le_jl>jCov(Xj,Xe ) <_ u(r), where u ( r ) = e ~r for some c~>0. THEOREM 4.10. [Bagai and Prakasa Rao (1995)] Let {Xn,n > 1} be a stationary sequence of associated random variables. Suppose that (A1), (A2) and (B) hold. Then,
fn(x) ~ f ( x )
a.s.
as n --+ oc
at continuity points x o f f ( . ) Uniform strong consistency of fn(x) follows from the following theorem. TUEOREM 4.11. [Bagai and Prakasa Rao (1995)] Let {Xn,n >> 1} be a stationary sequence of associated random variables satisfying the conditions (A), (B) and there exists 7 > 0 such that
h; 4 = O(n 7) .
Further, suppose that the following condition holds:
(4.23)
(C) If(x1) Then
f(x2)l
< c1~1 - ~21,
~1,x2
s.
sup[lNn(x) - f ( x ) l ; x E I 1 ~ 0
a.s.
Let {Xn, n _> 1} be a sequence of stationary associated random variables having a common marginal density function f(x). Let On(x,y), n = 1,2,..., be a sequence of Borel-measurable functions defined on R 2. Let
n
f.(x / =
k- 1
be the empirical density function. This gives a general class of estimation for estimating the marginal density f . Dewan and Prakasa Rao (1999) studied a set of
726
sufficient conditions under which the probability Pr(supx [fn(x) - f ( x ) l > e) ~ 0 at an exponential rate as n ~ oc.
4.3. Nonparametric failure rate estimation

The failure rate r(x) is defined as
r(x)
-
f(x)
P(x)'
P > 0 .
(4.24)
An obvious estimate of r(x) is r, (x) given by
r, (x) - f~(x) g, (x)
(4.25)
where f,(x) is the kernel type estimator defined in (4.22) and f',(x) is defined in (4.1). It is easy to see that
r,(x) - r(x)
=
F(x)~C~(x) - f(x)] - f(x)[F~(x) - F(x)]

F(x)F.(x)
(4.26)
In the following theorems, we prove pointwise as well as uniform consistency for r, (x). THEOREM 4.12. [Bagai and Prakasa Rao (1995)] Let {Xn,n >_ 1} be a stationary sequence of associated random variables satisfying the conditions (A) and (B) and suppose that for some r > 1,
{Cov(Xl,Xj)}l/3
j--n+ I
= O(rl-(r 1)) .
Then, for all x E S which are continuity points of f ,
r,(x)-+r(x)
a.s.
asn~ec
THEOREM 4.13. [Bagai and Prakasa Rao (1995)] Let {X,, n _> 1} be a stationary sequence of associated random variables satisfying the conditions of Theorems 4.4 and 4.10. Then sup{Ir,,(x ) - r ( x ) l : x E J } ~ 0 a.s. asn--+oc .
Roussas (1991) has discussed strong uniform consistency of kernel type estimates o f f and f(r), the rth-order derivatives o f f , and the hazard rate for strictly stationary associated sequences. Certain convergence rates are also discussed by him. Roussas (1993) considered a random field of real valued associated random variables and constructed the empirical distribution function, kernel estimates
727
of f , its derivatives, and the associated hazard rate. He discussed convergence, almost surely and uniform of these estimates under certain conditions, Roussas (1995) considered the asymptotic normality of the estimate of the distribution function.
4.4. Nonparametric mean residual life function estimation

The mean residual life function MF (x) is defined as
M r ( x ) = E [ X - x l X > x] =
F(t)dt .
(4.27)
An obvious estimate of Mr(x) is M,(x) given by
M,(x)
= F,(x 1)f
Fn(t)dt ,
(4.28)
where fix(x) is as defined in (3.1). Let Tr = inf{x : F(x) = 1}. TheOREM 4.14. [Shao and Yu (1996)] Let {X,, n _> 1} be a stationary sequence of associated random variables with distribution function F for X1. I f T < Tv and
~l~
then
O<x<_T
Cov
x,
<
oc
sup IMn(x) -MF(X)] ~ 0
a.s.
as n ---+cc .
(4.29)
This result follows from the fact that the mean residual life function MF(X) can be written as a weighted function of empirical distribution function [Shao and Yu (1996)].
4.5. Tests for change point problem for associated random variables
Let X 1 , X 2 , . . . , X , be normal r a n d o m variables with c o m m o n variance o-2 and E[X/] = / l
=/~+fi
i:l,...,k i=k+l,...,n
where 1 _< k < n. The constant k is known as the change point. Nagaraj (1990) derived locally optimal tests for / / o : 6 = 0 against H l : f i ~ 0, or against //2 : 6 > 0 where k is assumed to be a realization of a random variable K such that
P[K=k]
(n-l)'
k=l,2,....n-1
728
The three cases considered by him are (i) # -- 0 and a 2 -- 1 (ii) # unknown and 0-2 = 1 (iii) both # and 0-2 unknown. Nagaraj and Reddy (1993) derived the asymptotic null distribution of the test statistics derived in Nagaraj (1990) under the assumption that the observations are generated by a stationary associated process. Denote the mean by # = EIXt ] and the autocovariance function and the autocorrelation function of the process by 7 ( h ) = Cov(Xt,Xt+h) and p(h)= Cov(Xt,X(~+h))= (7(0))-17(h), respectively. They proved that for fixed h _> 0,
1 n h
h 77
~,(x,
- 2)(;<,+h
- 2)
-+ 7(h)
with probability one where J( is the sample mean. Thus the first order sample autocovariance is a consistent estimator for 7(h). They also looked at the effect of autocorrelation on the Type I error of the test for change in mean when the observations are from a first order autoregression with ~1 --> 0. They observed that the actual type I error is larger than the preassigned value.
References
Arjas, E. and I. Norros (1984). Life lengths and association: a dynamic approach, Math. Operat. Res. 9, 151 i58. Bagai, I. and B. L. S. Prakasa Rao (1991). Estimation of the survival function for stationary associated processes. Statist. Prob. Letters 12, 385-391. Bagai, I. and B. L. S. Prakasa Rao (1992). Analysis of survival data with two dependent competing risks. Biom. aT. 7, 801-814. Bagai, I. and B. L. S. Prakasa Rao (1995). Kernel-type density and failure rate estimation for associated sequences. Ann. Inst. Statist. Math. 47, 253-266. Barlow, R. E. and F. Proschan (1975). Statistical Theory of Reliability and Life Testing: Probability Models, Holt, Rinehart and Winston. Batty, C. J. K. (1976). An extension of an inequality of R. Holley. Quart. J. Math. 27, 457~461. Birkel, T. (I986). Momentenabschatzungen und Grenzwertsatze fitr Partialsummen Assoziierter ZuJallsvariablen. Doctoral thesis, University of Koln. Birkel, T. (1988a). Moment bounds for associated sequences. Ann. Prob. 16, 1184-1193. Birkel, T. (1988b). On the convergence rate in the central limit theorem for associated processes. Ann. Prob. 16, 1689-1698. Birkel, T. (1988c). The invariance principle for associated processes. Stoe. Process. Appl. 27, 57 71. Birkel, T. (1989). A note on the strong law of large numbers for positively dependent random variables. Statist. Prob. Letters 7, 17-20. Block, H. W. and Z. Fang (1988). A multivariate extension of Hoeffding's Lemma. Ann. Prob. 16, 1803-1820. Bulinskii, A. V. (1993). Inequalities for the moments of sums of associated multi-index variables. Th. Prob. Appl. 38, 342-349. Bulinskii, A. V. (1995). Rates of convergence in the central limit theorem for fields of associated random variables. Th. Prob. Appl. 40, 136-144. Burton, R. M., A. R. Dabrowski and H. Dehling (1986). An invariance principle for weakly associated random variables. Stoc. Process. Appl. 23, 301 306.
729
Burton, R. M. and E. Waymire (1985). Scaling limits for associated random measures. Ann. Prob. 13, 1267-1278. Burton, R. M. and E. Waymire (1986). The central limit problem for infinitely divisible random measures. In Dependence in Probability and Statistics (Eds., M. Taqqu and E. Eberlein), Birkhauser, Boston. Chung, M., B. S. Rajput and A. Tortrat (1982). Semistable laws on topological vector spaces. Z. Wahrsch. theorie und Verw. Gebiete 60, 209-218. Cox, J. T. and G. Grimmett (1981). Central limit theorems for percolation models. J. Statist. Phys. 25, 237-251. Cox, J. T. and G. Grimmett (1984). Central limit theorems for associated random variables and the percolation model. Ann. Prob. 12, 514~528. Dabrowski, A. R. (1985). A functional law of the iterated logarithm for associated sequences. Statist. Prob. Letters 3, 209-212. Dabrowski, A. R. and H. Dehling (1988). A Berry-Esseen theorem and a functional law of the iterated logarithm for weakly associated random variables. Stoc. Process. Appl. 30, 247-289. Daley, D. J. (1968). Stochastically monotone Markov chains. Z. Wahrsch. theorie und Verw. Gebiete 10, 305-317. Dewan, I. and B. L. S. Prakasa Rao (1997a). Remarks on the strong law of large numbers for a triangular array of associated random variables. Metrika 45, 225-234. Dewan, I. and B. L. S. Prakasa Rao (1997b). Remarks on Berry-Esseen type bound for stationary associated random variables. Gujarat Statist. Rev. 24, 19-20. Dewan, I. and B. L. S. Prakasa Rao (1999). A general method of density estimation for associated random variables. Nonparametric Statistics, 10, 405420. Esary, J. and F. Proschan (1970). A reliability bound for systems of maintained, interdependent components. J. Amer. Statist. Assoc. 65, 329-338. Esary, J., F. Proschan and D. Walkup (1967). Association of random variables with applications. Ann. Math. Statist. 38, 1466-1474. Evans, S. N. (1990). Association and random measures. Prob. Th. Rel. Fields 86, 1-19. Foley, R. D. and P. C. Kiessler (1989). Positive correlations in a three node Jackson queuing network. Adv. Appl. Prob. 21, 241-242. Fortuin, C., P. Kastelyn and J. Ginibre (1971). Correlation inequalities on some partially ordered sets. Comm. Math. Phys. 22, 89-103. Glasserman, P. (1992). Processes with associated increments. J. Appl. Prob. 29, 313-333. Harris, T. E. (1960). A lower bound for the critical probability in a certain percolation process. Proc. Camb. Phil. Soc. 59, 1330. Hjort, N. L., B. Natvig and E. Funnemark (1985). The association in time of a Markov process with application to multistate reliability theory. J. Applied Prob. 22, 473479. Hoeffding, W. (1940). Masstabinvariante Korrelations-theorie. Schr. Math. Inst. University of Berlin 5, 181333. Holley, R. (1974). Remarks on the FKG inequalities. Comm. Math. Phys. 36, 227 231. Hu, T. and J. Flu (1998). Comparison of order statistics between dependent and independent random variables. Statist. Prob. Letters 37, 1-6. Joag-Dev, K. (1983). Independence via uncorrelatedness under certain dependence structures. Ann. Prob. 11, 1037-1041. Joag-Dev, K. and F. Proschan (1983). Negative association of random variables with applications. Ann. Statist. 11,286-295. Karlin, S. and Y. Rinott (1980). Classes of orderings of measures and related correlation inequalities. 1. Multivariate totally positive distributions. Y. Muh. Analy. 10, 467-498. Keilson, J. and A. Kester (1977). Monotone matrices and monotone Markov processes. Stoc. Proc. Appl. 5, 231-241. Kemperman, J. H. B. (1977). On the FKG inequalities for measures on a partially ordered space. Indag. Math. 39, 313-331.
730
Kirstein, B. M. (1976). Monotonicity and comparability of time-homogeneous Markov processes. Math. Operationsforsch. Statist. 7, 151-168. Kuber, S. and A. Dharamadikari (1996). Association in time of a finite semi-markov process. Statist. Prob. Letters 26, 125-133. Lebowitz, J. (1972). Bounds on the correlations and analyticity properties of ferromagnetic ising spin systems. Comm. Math. Phys. 28, 313 321. Lee, M. T., S. T. Rachev and G. Samorodnitsky (1990). Association of stable random variables. Ann. Prob. 18, 387 397. Lefevre, C. and X. Milhaud (1990). On the association of the lifelengths of components subjected to a stochastic environment. Adv. App. Prob. 22, 961-964. Lehmann, E. L. (1966). Some concepts of dependence. Ann. Math. Statist. 37, 1137 1153. Lindqvist, B. H. (1988). Association of probability measures on partially ordered spaces. J. Mult. Anal. 26, 111-132. Marshall, A. W. and I. Olkin (1967). A multivariate exponential distribution. J. Amer. Statist. Assoc. 62, 30-44. Matula, P. (1996). Convergence of weighted averages of associated random variables. Prob. Math. Statist. 16, 337-343. Matula, P. and Z. Rychlik (1990). The invariance principle for nonstationary sequences of associated random variables. Ann. Inst. Henri Poincarg 26, 387-397. Nagaraj, N. K. (1990). Two sided tests for change in level of correlation data. Comm. Statist. B 19, 869-878. Nagaraj, N. K. and C. S. Reddy (1993). Asymptotic null distributions of tests for change in level in correlated data. Sankhya A 5, 37-48. Newman, C. M. (1980). Normal fluctuations and the FKG inequalities. Comm. Math. Phys, 74, 119-128. Newman, C. M. (1983). A general central limit theorem for FKG systems. Comm. Math. Phys. 91, 75-80. Newman, C. M. (1984). Asymptotic independence and limit theorems for positively and negatively dependent random variables. In Inequalities in Statistics and Probability (Ed., Y. L. Tong), pp. 127-140. IMS, Hayward. Newman, C. M. and A. L. Wright (1981). An invariance principle for certain dependent sequences. Ann. Prob. 9, 671 675. Newman, C. M. and A. L. Wright (1982). Associated random variables and martingale inequalities. Z. Wahrsch. theorie und Verw. Gebiete 59, 361 37I. Oliveira, P. and C. Suquet (1995). L2(0, 1) weak convergence of the empirical process for dependent variables. In Wavelets and Statistics (Eds., A. Antoniadis and G. Oppenheim), pp. 331-344, Lecture Notes in Statistics 103, Springer-Verlag, New York. Peligard, M. and R. Suresh (1995). Estimation of variance of partial sums of an associated sequence of random variables. Stoch. Process. Appl. 56, 307 319. Pitt, L. (1982). Positively correlated normal variables are associated. Ann. Prob. 10, 496-499. Prakasa Rao, B. L. S. (1983). Nonparametric Functional Estimation Academic Press, Orlando. Prakasa Rao, B. L. S. (1993). Bernstein-type inequality for associated sequences. In Statistics and Probability. A Raghu Raj Bahadur Festschrift (Eds., J. K. Ghosh, S. K. Mitra, K. R. Parthasarathy and B. L. S. Prakasa Rao), pp. 499 509, Wiley Eastern, New Delhi. Prakasa Rao, B. L. S. (1998). Hoeffding identity, multivariance and multicorrelation. Statistics 32,
13~9.
Preston, C. J. (1974). A generalization of the FKG inequalities. Comm. Math. Phys. 36, 233 241. Queseda-Molina, J. J. (1992). A generalization of an identity of Hoeffding and some applications. J. Ital. Statist. Soc. 3, 405-41 i. Resnick, S. I. (1988). Association and multivariate extreme value distributions. In Gani Festschrift: Studies in Statistical Modeling and Statistical Science (Ed., C, C. Heyde), pp. 261 271. Statist. Soc. of Australia.
731
Roussas, G. G. (1991). Kernel estimates under association: strong uniform consistency. Statist. Prob. Letters 12, 393~403. Roussas, G. G. (1993). Curve estimation in random fields of associated processes. Nonparametric Stats. 2, 215 224. Roussas, G. G. (1994). Asymptotic normality of random fields of positively or negatively associated processes. J. Mult. Anal. 50, 152-173. Roussas, G. G. (1995). Asymptotic normality of smooth estimate of a random field distribution function under association. Statist. Prob. Letters 24, 77-90. Sadikova, S. M. (1966). Two dimensional analogues of an inequality of Esseen with application to the central limit theorem, Th. Prob. Appl. 11, 325-335. Samorodnitsky, G. (1995). Association of infinitely divisible random vectors. Stoch. Process. Appl. 55, 45-55. Shaked, M. and J. G. Shantikumar (1990). Parametric stochastic convexity and concavity for stochastic processes. Ann. Inst. Statist. Math. 42, 509-531. Shaked, M. and Y. L. Tong (1985). Some partial orderings of exchangeable random variables by positive dependence. J. Mult. Anal. 17, 333-349. Shao, Q.-M. and H. Yu (1996). Weak convergence for weighted empirical processes of dependent sequences. Ann. Prob. 24, 2098-2127. Shepp, L. A. (1964). A local limit theorem. Ann. Math. Statist. 35, 419-423. Szekli, R. (1995). Stochastic Ordering and Dependence in Applied Probability, Lecture notes in Statistics, No. 97 Springer-Verlag, NewYork. Weron, A. (1984). Stable processes and measures: A survey. In Probability Theory on Vector Spaces III. Lecture Notes in Math, No. 1080, 306-364, Springer, Berlin. Wood, T. E. (1983). A Berry Esseen theorem for associated random variables. Ann. Prob. 11, 1042-1047. Wood, T. E. (1984). Sample paths of demimartingales. In Probability Theory on Vector Spaces III. Lecture notes in Math, No. 1080, 365-373, Springer, Berlin. Wood, T. E. (1985). A local limit theorem for associated sequences. Ann. Prob. 13, 625-629. Yu, H. (1993). A Glivenko-Cantelli lemma and weak convergence for empirical processes of associated sequences. Prob. Th. Rel. Fields 95, 357-370. Yu, H. (1996). A strong invariance principle for associated sequences. Ann. Prob. 24, 2079-2097.
'~ 1
z,,
1
Exchangeability, Functional Equations, and Characterizations
C. R. Rao and D. N. Shanbhag
1. Introduction
A sequence {Xm :m E M } , with M = { I , 2 , . . . , N } or = {1,2,...} where N is a positive integer, of random variables defined on a probability space is (or, when there is no confusion, its members are) referred to as exchangeable (or permutable or interchangeable) if, for each n and each permutation ~ of ( 1 , 2 , . . . , n),
,x,,) g
the measure induced on (the Borel a-field of) RM by the sequence is sometimes referred to as a symmetric measure. There are certain other criteria, including that of selection property, which are individually equivalent to the criterion of exchangeability, when M is infinite; for the relevant literature in this connection, see Aldous (1985; Section 6), and for an interesting elementary proof showing that the selection property is equivalent to that of exchangeability, see Kingman (1978). (Incidentally, the selection property states that given positive integers k, m l , . . . , mg such that {mr : r = 1 , 2 , . . . , k} is a strictly increasing sequence of members of M, the random vector (Xml,... ,Xmk) is distributed as (X1,... ,Xk).) One of the key results on exchangeability is de Finetti's theorem asserting that if {Xm : m = 1,2,...} is an exchangeable sequence of random variables, then there is a a-field conditional on which )('ms are independent and identically distributed (i.i.d.). Among various proofs of this result or its extended versions, there are those due to Hewitt and Savage (1955) and Olshen (1973, 1974). Kingman (1978) and Aldous (1985) have reviewed or unified the literature on exchangeability and related topics, which includes, among other things, numerous applications in population genetics, random walk, combinatorics, queueing and storage networks, Bayesian statistical inference and other areas. If {Xm : m = 1 , 2 , . . . , n}, with n >_ 2, is an exchangeable sequence of random variables for which there exists a a-field conditional on which Xm~S are i.i.d, and f : R ~ R is a Borel measurable function such that E(If(X1)]), E(If(X~)f(X2)I ) < oc, then it follows that cov(f(X1),f(X2))>_ O. Using the
733
734
nl f ( m ) X , conditional distribution of (X1,. . . ,Xn) given ~ m = or otherwise, it can easily be seen that there exist finite exchangeable sequences {Xm : m = 1 , 2 , . . . , n} with n > 2 for which E(If(X1)[),E(If(X1)f(X2)I ) < oc and cov(f(X1),f(X2)) < O, implying that in this case there does not exist a a-field conditional on which X~'s are i.i.d.; X'mS with joint uniform distribution on {x E R n : Ilxll = 1} or with 1 , . . . , ~) 1 provide us with an multinomial distribution having parameter vector (k ; ~ example of this type of sequences. In spite of this, one can cite cases of finite exchangeable sequences which have many appealing properties. Sequences with spherically symmetric Dirichlet distributions or symmetric multinomial distributions are among such sequences. (Here by "symmetric Dirichlet" and "symmetric multinomial", we mean "Dirichlet with parameter vector of the type 1 , ' " , n !J ],, (~, " , ~)" and "multinomial with parameter vector of the type (k . , ~ respectively.) Interesting characterization results involving some of these sequences, or their variations, have been given by, among others, Schoenberg (1938), Zinger (1956), Freedman (1963), Kingman (1972), Letac (1981), Smith (1981), Alzaid et al. (1990), and Rao and Shanbhag (1994, 1997); these results are motivated by, or are of relevance to, certain aspects of Bayesian inference and multivariate analysis. The concept of exchangeability is also linked to certain problems on functional equations, including those on the integrated Cauchy functional equation or the convolution equation dealt with, among others, by Choquet and Deny (1960), Deny (1961), Shanbhag (1977), and Lau and Rao (1982). Various proofs based on exchangeability to arrive at solutions to some of these functional equations have appeared in the literature, see, for example, Ressel (1985), Alzaid et al. (1987), Shanbhag (1991) and Rao and Shanbhag (1991). The monograph of Rao and Shanbhag (1994) unifies and extends some of these proofs. The functional equations referred to here have played a key role in recent developments on characterization of probability distributions and stochastic processes, unifying and extending several of the prior results on stable, Generalized Pareto and Polyfi-Eggenberger distributions, as well as on stochastic processes such as point processes, branching processes and certain processes in queueing networks. (It is worth pointing out in this place that the distributions listed immediately above have as their specialized versions normal, exponential and Pareto, and Poisson and negative binomial distributions respectively.) These functional equations have also applications in areas such as renewal theory, epidemiology, reliability and extreme value theory; see, for example, Rao and Shanbhag (1994, Chapter 8) and Asadi et al. (1998). The present chapter reviews and highlights certain results on exchangeability, including versions of de Finetti's theorem, that play a crucial role in studies of functional equations, and sketches some of the arguments based on exchangeability that are used in the literature for solving functional equations. Also, it touches upon several characterizations of probability distributions that follow via the results on functional equations or directly de Finetti's theorem. In the process of doing so, it provides us with some new applications and extensions of the existing results.
Exchangeability,functionalequations,andcharacterizations
2. de Finetti's theorem and some related auxiliary results
735
Theorem 2.1 appearing below is a version of de Finetti's theorem and it is pivotal to our study in the present paper. For a detailed account of the research on de Finetti's theorem, we refer the reader to Aldous (1985), where various proofs of the result, its extensions and applications have appeared. THEOREM 2.1. Let (R , (~(R))~,P) be a probability space with ~(R) as the Borel o--field corresponding to R, such that the co-ordinate mappings are exchangeable random variables (i.e. having P as symmetric in the sense of Hewitt and Savage (1955)). Then, for some sub o.-field ~ of (~(R)) , there exists a regular conditional probability P(Cl~)(m), C c (~(R)) , co (~ =)R , given ~ for which, for each co, P(.lff)(co) yields the co-ordinate mappings to be independent and identically distributed random variables. Moreover, ~ can be taken to be either the invariant o--field J or the tail o.-field Y (relative to the co-ordinates of the space), with a common regular conditional probability having the stated property. For a nice proof of the theorem, see Lo~ve (1963, p. 365) or Chow and Teicher (1979, pp. 220-222). This involves showing, first, that {~(B) :n = 1 , 2 , . . . } is a Cauchy sequence in L2 and hence there exists a measurable function (B) (with values in [0, 1]) such that {~n(B) : n = 1,2,...} converges in probability to ~,(B), where ~(B) = ~i=11{xi~B}, 1 n = 1 , 2 , . . . with X/'s as the co-ordinate mappings, and that P({X1
c BI,...,Xm E Bm} 71C) = E( (~.HI{(Bj))Ic )
for each B1,...,Bm E N(R), m = 1 , 2 , . . . , and invariant set C. As 3- c J , one is then, with some effort, led to
P{X1 BIJ} =
and to
~(B) =
P{XI B I J } ,
m
a.s.
B E ~(R)
P{x
c B,,. .
< B IJ} = H P{x,

i=1 m
n, IJ}
=- H P{X1 BilJ}
i=1
= P{X1 E B1,... ,Xm E BmlJ} a.s., B1,...,Bm E .~(R), m = 1,2,....

Applying now a standard technique involving the family {(Qco)~(.) :co f~} of product measures on (~(R)) a with Qco,for each co, as the measure determined on ~(R) by Fo0, where Fo is a version of the conditional distribution function of X1 given Y , and the monotone class theorem, we see that the present version of de Finetti's theorem holds. (Note that it suffices to get here
736
(.~(R)) = {C C (N(R)) : (Q.)~(C) = P ( C l J ) = P ( C l Y ) a.s. and (Qo)~(C)is measurable}.) A slightly different version of the proof of Theorem 2.1, involving the reversed martingale convergence theorem, appears in Kingman (1978) and, with more details, in Aldous (1985). REMARK 2.2. Any Polish space (i.e. any complete separable metric space) is Borel-equivalent to (or, in other words, Borel-isomorphic to) a Borel subset of [0, 1], i.e. there exists a one-to-one correspondence ~b from the Polish space to the Borel subset of [0, 1] such that both ~b and ~b-1 are measurable (see Ash (1972; p. 265)). Consequently, the above version of de Finetti's theorem implies that its extension with S, a Polish space, in place of R and "random elements" in place of "random variables" holds; Aldous (1985; p. 51) has effectively given an explanation for this. Now (with the notation as above) N can also be taken to be the a-field generated by {Qo(B) :B C ~(SP)}, where Qo is a version of the regular conditional distribution of X1 given Y. Using the Borel equivalence of a Polish space and a Borel subset of [0, 1], it can further be seen, as in Olshen (1973, 1974), that there exists a J-measurable real (bounded) random variable so that we can choose N to be the a-field generated by this random variable. All the versions of ~f that we have met here are identical up to events of zero probability (see Olshen (1973, 1974) and, in the special case S = R, Kingman (1978)). REMARK 2.3. In the special case when the random variables are 0-1-valued, Feller (1966; p. 225) proves a version of the theorem, using a moment argument. Theorem 2.1, on applying in conjunction with the observation in Remark 2.2, gives us the following corollary. The corollary and its specialized versions have played a crucial role in studies relating to the integrated Cauchy functional equation. COROLLARY 2.4. Let {Xn :n = l, 2,...} be an exchangeable sequence of random variables, or elements, with values in a general Polish space. Then, for each m, n = 1 , 2 , . . . , and appropriate Borel sets B1, B2,. , Bm+n (i.e. Borel subsets of R or of the general Polish space according as what X~s are), we have
P{X~ E 91,... ,X,,, E B,~,X,,+I ~ B1,... ,X2,, E Bin} P{X1 ~ Bin+l,... ,Xn C Bm+n,)(n+l C Bin+l,... ,X2n E Bin+n} ~> (P{X1 EB1,...,Xm+n CBm+n}) 2 (1)
This latter result follows, in view of the Cauch~Schwarz inequality, because Theorem 2.1 implies that the left hand side of (1) equals
E((P{XI E B I , . . . ,Xm E BmIY}) 2)
E((P{Xm+l C Bin+l,... ,Xm+, C Bm+,IY}) a)
Exchangeability, functional equations, and characterizations

and the right hand side of (1) equals
737
(E(P{X1 ~ B1,...,Xm C Bm[~-} P{Xm+I E Bin+l,... ,Xm+, E Bm+nl@})) 2

A closer scrutiny of the proof of Corollary 2.4 implies that it does not require the full power of Theorem 2.1 or its extended version relative to a Polish space. Indeed, using an extended version of an argument of Hewitt and Savage (1955), Rao and Shanbhag (1994; pp. 13-14) have established the following theorem (see also Davies and Shanbhag (1987) and Shanbhag (1991) for specialized versions of the result). Even though this theorem is not of much use in the present review, we shall mention it here since it is of relevance to the corollary given above. THEOREM 2.5. Let ( ~ , o~J,Pj), j = 1 , 2 , . . . , be probability spaces with ((f2j, in standard notation and) measures Pj as symmetric (i.e. such that every permutation (il,...,ij) of ( 1 , . . . , j ) and members E1,...,Ej of Pj(Eil " " Ei:) = Pj(E1 x ... x Ej)) and satisfying Pj(E (j)) = Pj+I(E U) E(J) ~ gJ. Then
(~+s,(E1 . . . Ej+s)) 2 <_ ~'2j(F~ . . . Ej E, . . . Es)
gJ) for 8,
(2),
P2s(Ej+I ' "
Ej+s ej+l . . . E H ) ,
EiES, i= 1,2,...,j+j', j,j'=l,2,...
(2)
The result of Theorem 2.5 follows if the inequality (2) is established in the case of j = j ' = 1. If we denote for j = 1,2,..., by Xy ), the projection mappings on (if, gJ,Pj), then, for El,E2 C 8, we have
= _PI(E1 N E2) -t J
J(Jz 1)
j2
P2(E1 E2) j = 1,2,... ,
(3)
where the notation Ej outside the brackets is for the expectation. Denoting the left hand side of (3) by Mj(EI,E2), we can then see that the required result holds because the Cauch~Schwarz inequality implies
(P2(el E2)) ~ = Ms(E,,F~)
~()imMj(E1,E1))()imMj(E2,E2))
= P2(E1 E1)Pa(E2 x E2) .
REMARK 2.6. Theorem 2.5 can also be obtained as a consequence of Corollary 2.4. This is so because if we take j , f , E l , . . . , Ej+j, fixed, then it follows that there exist a sequence {Yn :n = 1,2,...} of exchangeable random variables
738
with values in
{0, 1,2, 3}
and subsets B1 and B2 of
{0, 1,2, 3}
such that
P{Y1 E B 1 , Y 2 E B 2 } = Pj+j,(E1 x . - . Ej+j,),

P{]v1 C B1,Y2 E B 1 } = P 2 j ( E I x . . . x E j xE1 ... Ej)
and
P{Y1 E B2, Y2 E B 2 } = P 2 j , ( E j + I x . . . x Ej+f x & + l
x - . . x Ej+f) .
REMARK 2.7. Under the assumptions in Theorem 2.5, on taking Ej__1 . . . . = Ej+f = f2, (2) implies that
P2j(E1 X " " X Ej E1 X " " X Ej) >_ ( P j ( E I . . '
x Ej)) 2 .
REMARK 2.8. One can also arrive at Theorem 2.1 or its extension referred to in Remark 2.2 via extreme point methods, involving Choquet's theorem or otherwise. (For the details of Choquet's theorem, see Phelps (1966).) Here the class of all symmetric probability measures on (S, (N(S)) ) is a simplex with extreme points to be the members of the class that are product measures (i.e. the measures relative to which the co-ordinate mappings are independent); Dynkin (1978) proves a theorem, i.e. his Theorem 7.1, subsuming this result, using an approach based on sufficient statistics without involving deeper topological concepts. (By the statement that the class of symmetric measures referred to above is a simplex, we mean that it is such that each of its members is the barycentre of one and only one probability measure on the Borel a-field of the set of all extreme points, relative to a metric of weak convergence.) REMARK 2.9. Incidentally, there is no loss of generality in assuming the Polish space S in Remark 2.2 to be compact. Diaconis and Freedman (1980a) have shown that a version of de Finetti's theorem holds for compact Hausdorff spaces, while Dubins and Freedman (1979) have given an example showing that there exist spaces for which de Finetti's theorem does not hold. Before moving to the main discussion of applications of exchangeability in studies of functional equations and characterizations, we shall mention some auxiliary results that are crucial to these applications. THEOREM 2.10. Let
H(x) = fs H(x +y)#(dy),
x ES
(4)
be an integral equation with S as an Abelian topological semigroup with zero element, H as a non-negative continuous function on S and/, as a a-finite measure on (the Borel a-field of) S. Let S* (/~) be the smallest closed subsemigroup of S,
Exchangeability,functional equations, and characterizations
739
with zero element, containing supp[#]. Then, for every x E S and y,z E S*(#), we have
H(x 2_ 2y)H(x -~-2z) ~ (H(x + y 4- g)) 2 .
(s)
(For any measure t / o n S, {x E S : o(Ux) > 0 for each neighbourhood Ux of x} is referred to as the support of t/; we denote it by supp[t/].) COROLLARY 2.11. Under the assumptions in Theorem 2.10, we have, for every x E S and y E S* (#),
H(x)H(x + 2y) _> (H(x +y))2 .
(6)
Rao and Shanbhag (1994) have proved Theorem 2.10 involving mainly the following steps. The first of these is to observe that ifx is such that H(x) ~ 0 and it is fixed, then the probability spaces (Y2J,CJ,Pj), j = 1,2,..., with O = S , g = the Borel a-field of S, and Pj as the probability measures on NJ for which, for each
E1,...,Ej E ~,
Pj(E1 " " Ej)
=(H(x))-lJ[3"'"fE1H(x+yl+'"+YJ)/I(dyl)'"#(dyj)
'
(7)
meet the requirements of Theorem 2.5. The next one is to see that, in view of the continuity of H, for every integer n _> 1 and Yl,..-,Yn, z l , . . . , zn (supp[#]) U {0},
H(x + 2(yl + ' "
+ yn))H(x + 2(zl + . . . + Zn))

+Z1 + ' ' " + Z n ) ) 2
> (H(x+yl + ' " + y ,
Because of the continuity of H it is a simple exercise to conclude, appealing to the inequality, that the theorem holds. (Note that the interim inequality follows as (7) implies that if Yl,.--,Yj E supp[#], there exists a sequence {Oim : i = 1,... ,j, m = 1,2,...} of open sets of S with positive/~-values such that as m--+ ~
P:(Olm . . . Ojm) _+H(x + yl + . . . + yj) .)

(H!=l ]A(Oim))
H(x)
REMARK 2.12. If g in (4) is concentrated on a countable set, then (5) of Theorem 2.10 holds even when the H is assumed to be measurable without being continuous, provided we take S(#) in place of S* (/;), where S(#) is the smallest subsemigroup of S, with zero element, containing {x c S : #{x} > 0}. Also, in this case, if H(x) =/=O, there exists a sequence {Yn : n = 1,2,...} of random elements with values in S(/~) such that, for each integer n >_ 1,
740
H(x + y~ + . . . + yn) n -H~x5 I I v{yi} = P{ Y1 =

i=1
yx,...,~
= Yn}~ Yl,''',Yn E S(~) ,
where v(.) = ( ~ - 0 2-n-l#*n)(')" Essentially, in view of Theorem 2.1, we have
now
P{Y~=y~+y2l j- }
V{yl @.F2}
P{Y1 ~- y l l Y } V{.,V2 }
P{Y1 = y213 -} =0, v{y2} y~,y2 ~S(#), a.s.
(8)
with {P{YI = yJ3-} : y ~ S(#)} as the conditional probability distribution of 111 given Y , the tail o--field of {Yn}, such that H ( x + y ) / H ( x ) = E { P { Y ~ = y[Y}/v{y}}, y E S(#). (Note that (8) holds because the expectation of the square of the left hand side of the identity equals zero for each yl,y2 c S(#), and S(#) is countable.) This latter observation implies that we can choose here a version of the conditional distribution of Y1 given J so that P{Y1 = ylY-}/v{y}, y E S(l~), is a v-harmonic (as well as #-harmonic) exponential function, under discrete topology, on S(#). REMARK 2.13. If S of Theorem 2.10 is a group and the smallest closed subgroup of S, containing supp[#], is S itself, then, we have
H(x+2z)H(x) >_ (H(x+z)) 2, x,z C S ;

this follows, as H is continuous, in view of (5), on taking in it x - 2(Zl + ... + zn), with z l , . . . ,zn E (supp[/~]) U {0}, n _> 1, in place ofx. It is of relevance to the present study to mention certain results in storage theory, involving aspects of exchangeability, especially concerning mixtures of Markov chains; Diaconis and Freedman (1980b) and Aldous (1985) have given some interesting results on the mixtures referred to, of a somewhat different type. Suppose we understand, for any integer n _> 1, by an n-cyclically exchangeable sequence {g}~,!..,i, : i l , . . . ,in = 0, 1,...} of complex numbers, a sequence of complex numbers such that for each ( i l , . . . , in), g}~') i is invariant if we take in place of ( i l , . . . , i n ) any of its cyclical permutations. If one defines for each positive integer u < _(n) - n , Gun , to be the sum of ~i~,...,i,, over ( i l , . . . , i n ) for which (n) (il - 1 , . . . , in -- 1) has n to be the uth strict descending ladder index and G~_, to (n) n be the sum ofgi~ i, over ( i l , . . . , in) for which ~ r 1 ir = n - u, then Lemma 1 on p. 394 of Feller (1966) asserts that G, 'n = u ~(n) n-n -"'
u= 1,2,...,n . (9)
If {hiL,...j(n)." zl,... , " in = 0, 1,... } is a sequence of complex numbers such that for each(il,'...,in) within=O,h}~,!..,i, =gl~,!..,i,,then, thesumofh}~,!..,i,,overil, in for which (il - 1 , . . . , in - 1) has n to be the uth strict descending ladder index, also equals Gmn given by (9). In particular, if we have, for each ( h , . . . , in),
Exchangeability,
functional
equations, and characterizations
741 (10)
,1 ....... =
d~ \r=O
Pir,ir+~
v(dt~_)
with io = 0 and
~il,...,in =
n--1 ) Pir,ir+l Y(dP__)
(11)
with i0 = in, where N is a space of sequences P = {Pij : i,j = O, 1,...} of complex numbers and v is a measure on ~ (i.e. on its Borel ~-field) for which Ipijl n are integrable for all i,j (for each n), then it follows that the result stated holds with G,,~=n p ,v(dP_), u=l,2,...,n , (12)
where p(~u is the sum of H~-~ Pit,it+l, with i0 = in, over i l , . . . , i n for which n Ir = El - - U. It is worth pointing out here that { ~ . j . - O, 1,...} is inEr=l ~ p j (u+j) . fL f~ I ' j~(j'+l) : j = 0, 1,...}. In the case when (Ply) is a deed the u-fold convolution or stochastic matrix and v is a probability measure, one can view Gu,~ of (12) as a certain first passage probability in storage theory. We delayed discussing it in order to get acquainted with the general combinatorial results involving complex variables; these have applications in distribution theory, especially in the case where pljs are independent of i and v is a degenerate measure (i.e. a measure concentrated on a singleton), see, for example, Fosam and Shanbhag (1997b). (Incidentally, one can also apply (12) to arrive at mixtures of some of the distributions discussed in the cited reference; these are of relevance to the material in section 4 of our paper.) Now, to give the application of (12) in storage theory, let us specialize to the situation where (/),7) is a stochastic matrix and v is a probability measure. Consider a discrete dam of infinite capacity with initial dam content u (where u is a positive integer) and satisfying the following conditions: (i) the inputs to the dam take place only during time intervals ( t - 1,t), t = 1 , 2 , . . . such that for each positive integer n, the inputs to the dam during time intervals (t - 1, t), t = 1 , 2 , . . . , n, are distributed with distribution as in (10) (with v independent of El); (ii) the releases from the dam can take place only at points t = 1 , 2 , . . . , and at each of the time points there is a release of unit quantity if the dam is not empty (and no release otherwise). Then, {G,,~ : n = u, u + 1,...}, where G~,~ is as given by (12), is the distribution of the waiting time for the dam to be empty (for the first time). (One can obviously modify the probability measure v to take into account the past history of the dam content process in the case where this is possible.) Matthews (1972) and Shanbhag (1973) have dealt with a specialized version of this result when v is degenerate; following these authors, we can also write p~"_), in (12) as the coefficient of 0"-" in the expansion of tr{(P(0))~}, 0 _< 0 < 1, where P(O) = (pijOJ).
742
In the case when p;js are independent of i, the result in storage theory simplifies to the one with {p~n;} as the n-fold convolution of {pj}; the specialized version of this result when v is degenerate has appeared in the literature, see, for example, Mott (1963), Prabhu and Bhat (1963) and Prabhu (1965; pp. 117-118). REMARK2.14. Shanbhag (1973) has derived the distribution { G u , n : n = u, u + 1,...} for a dam model mentioned earlier, involving a certain probability distribution of cyclically exchangeable random variables; to have the distribution of cyclically exchangeable random variables well defined, he imposed the condition that p00 > 0. (Clearly there is no loss of generality in assuming P00 > 0 in the derivation of the result in question; also, the case of P00 = 0 is somewhat unnatural in a dam model.)
3. Functional equations with applications in probability and statistics
We begin our discussion of functional equations with the following extension of the celebrated Choquet-Deny (1960) theorem. The Choquet-Deny theorem or indeed its specialized version relative to the real line plays an important role in renewal theory and allied topics, and it has several other variations or generalizations; see Rao and Shanbhag (1994) for some applications, and Sz6kely and Zeng (1990) for a brief review of the earlier literature, in this connection. The theorem given below, with a proof based on exchangeability, and its corollaries mentioned are due to Rao and Shanbhag (1991); a slightly weaker version of the theorem was established earlier via a martingale argument by Sz6kely and Zeng (1990). THEOREM 3.1. Let S be an Abelian topological semigroup, h a bounded nonnegative Borel measurable function on S and # a subprobability measure on S. Then
h(x) = fss h(x + y ) # ( d y ) ,

if and only if either h = 0 or
x ES
(13)
and #(S) y E supp[#] for each x c S.
h(x) :/0 for some x, h(x+y)=h(x), for a.a. [#1ycSforeachxcS (14) = 1. Moreover, ifh is continuous (14) implies that h(x +y) = h(x) for all
The "if" part of the assertion is trivial. To have the "only iF' part of the assertion, it is sufficient if we assume the existence of a point x0 c S and a Borel set B0 such that
h(xo)#(Bo) < f h(xo + y ) # ( d y )

3/3 o
(15)
and arrive at a contradiction. Assume then the existence of x0 and B0 meeting the requirements; this implies that h(xo)> 0 and #(B0)> 0. In view of Fubini's
Exchangeability,functional equations, and characterizations
743
theorem and a trivial special case of the K o l m o g o r o v consistency theorem, it follows from (13) that there exists an infinite sequence {Zn:n = 1 , 2 , . . . } of 0 - 1-valued exchangeable r a n d o m variables (defined on a probability space) such that for each n >_ 1,
P{Z 1
1 , . . . , Z , = 1}
"''JR h(xo+Yl+...+yn)#(dyl)...#(dyn)
o o
=(h(x))-lfB
(16)
F r o m Corollary 2.4 we have immediately that

P{Zl ; 1 , . . . , Z 2 o = 1}
> (P{Z1 1 , . . . ,Z2,-1 = 1}) 2
> (P{Z1 = 1}) a',

implying that
n_> 1 ,
P{Z 1
1 , . . . ,Z2,, = 1}/(#(B0)) 2" ___ (P{Z1 = 1}/#(Bo)) z',
n>l
. (17)
(17) leads to a contradiction since, due to (13) and the boundness of h, the left hand of the inequality is bounded relative to n while the right hand side of the inequality tends to ec as n --+ co. Hence we have the required result. The second assertion of the theorem is now obvious. COROLLARY 3.2. Let S be as in the theorem, g:S--+ R be a bounded Bore1 measurable function and #(1) and #(2) be subprobability measures on S such that #(1) + #(2) is a probability measure on S. Then
g(x) = fss g(x + Y)(#(1) - #(2))(dY)'

if and only if for each x E S,
x ES
(18)
g(x+y)={
g(x) -g(x)
fora.a. [#0)]ycS for a.a. [#(2)] y c S ;
(19)
moreover, if g is continuous (18) implies that for each x E S (19) is valid with " f o r a.a. [#(i)] y c S" replaced by " f o r y c supp[#(i)] '' respectively. (Here "a.a." refers to "almost all".) Corollary 3.2 follows from T h e o r e m 3.1 on noting that if we define for each n > 0
744
where oc > c _> ]9(x)[ for all x E S, then (18) implies that
h(n,x) = f h(n,x + y)#(1)dy + .~ h(n + 1,x + y)#(2)(dy),

n=0,1,...,
x~S
which obviously is of the form of (13) with No x S in place of S (where No = {0, l, 2,...}) and # as a probability measure.
REMARKS 3.3.
(i) The argument given to show the validity of Theorem 3.1 simplifies slightly if just the version of de Finetti's theorem referred to in Remark 2.3, is used without any reference to Corollary 2.4. The version in question implies immediately that
P{Z1 = 1,...,Zn = 1} _> (P{Z1 = 1}) ~ for n = 1 , 2 , . . . . ,

and hence that (17) holds with "2 n'' replaced by "n". (ii) As the arguments used to have Theorem 3.1 and Corollary 3.2 for general S are not anymore complicated than those relative to the specialized versions when S = R or [0, oc), we have dealt with here the results in the general case. (iii) Apart from Corollary 3.2, one could arrive at several other results as corollaries to Theorem 3.1; see, for example, Rao and Shanbhag (1991) or (1994) for some possibilities in this connection. As hinted earlier, among various important results that could be proved via Theorem 3.1 or its specialized version relative to the real line, we have the following version of the renewal theorem (see, for example, Chapter 9 of Feller (1966) or pages 209 and 210 of Rao and Shanbhag (1994)): THEOREM 3.4. Let F be a probability distribution function concentrated on [0, oc) such that F(0) = 0 and p be the corresponding mean (possibly infinite). Denote by U the renewal function relative to F. Then, if F is non-arithmetic, we have that as y --+ ec
V(x + y) - v(y) x -
for every fixed x c (0, oc). Furthermore, if F is arithmetic with span 2 > 0, the same is true with x and y as integral multiples of )~. One can also appeal to Theorem 3.1 or its variations to arrive at certain characterization results relative to stable distributions or their extended versions. Kagan et al. (1973), Ramachandran and Lau (1991) and Rao and Shanbhag (1994) have dealt with some such applications. However, we shall not deal with these applications here, except for the following two theorems appearing as Theorems 6.4.1 and 6.4.6 respectively in Rao and Shanbhag (1994); these results
745
can be proved using a slightly different version of T h e o r e m 3.1 given in R e m a r k 3.8(iii) (and a certain result in H a r d y (1967, p. 37)). THEOREM 3.5. Let ~b be the characteristic function of a non-degenerate probability distribution on Rp. Then the following are equivalent: (i) F o r some positive constants cl, c2, c3 with c2 1, and al, a2 C Rp, (~b(t)) 2 = exp{i/al,t)}0(Clt), and
O(t)(a(c2t) = exp{i(a2, t)}(P(c3t), t E Rp . t E Rp
(The condition implies that cl, c3 1, as ~b is non-degenerate.) (ii) F o r some positive constants cl, c2, c, and a E R p, such that Cl/C and c 2 / c are non-commensurable (i.e. such that there are no integers m and n for which
=
O ( c l t ) O ( c 2 t ) = exp{i{a,t}}0(ct), t C Rp .
(iii) 4 is stable. THEOREM 3.6. Let P be the generating function of a probability distribution on No such that 0 < P(0) < 1. Then the following are equivalent: (i) F o r some constants cl, c2, c3 c (0, oc)
(P(z)) 2 = r(l - c 1 -~- ClZ), Izl
_< 1
and
P ( z ) P ( 1 - c2 + c2z) = P(1 - c 3 -]- c3z),
[zI ~
1 .
(ii) F o r some constants cl, c2, c E (0, 1] such that c l / c and c 2 / c are noncommensurable, P(1 - c~ + c l z ) P ( 1 - c2 + c2z)
=
P(1 - c + cz),
Izl < 1 .
(iii) P is discrete stable. (The theorem also holds if we replace, in the conditions, Izl _< 1 by z E (a, b), where a and b with a < b are given numbers in (0, 1]; also, in the identities involving P ( z ) with z < - 1 , P should be viewed as the extension to ( - o c , 1] of the probability generating function in question.) Rao and Shanbhag (1994) have proved Theorem 3.5, and hence implicity T h e o r e m 3.6, using the following T h e o r e m 3.7, which is a theorem due to Lau and Rao (1982) (see also Zeng (1992) for a p r o o f for the equivalence of (ii) and (iii) of T h e o r e m 3.5 in the same spirit); a simple p r o o f based on exchangeability of the L a ~ R a o theorem appears in Alzaid et al. (1987). A closer scrutiny of the argument shows that the argument remains valid even when in place of T h e o r e m 3.7, its specialized version with # as a probability measure, i.e. the result in R e m a r k 3.8(iii), is used.
746
THEOREM 3.7. Let H be a non-negative real locally integrable Borel measurable function on R+, other than a function which is identically 0 almost everywhere ILl, such that it satisfies
H(x) : fR+ H(x + y)#(dy)
for a.a. [L]x C R+
(20)
for some a-finite measure # on (the Borel a-field of) R+ with #({0}) < 1 (yielding trivially that #({0} C) > 0), where L corresponds to Lebesgue measure and "a.a." refers to "almost all". Then, either # is arithmetic with some span 2 and
H ( x + n 2 ) = H ( x ) b n,
with b such that
oo
n=0515..,
fora.a. [L]xER+
b~#({n2}) = 1 ,
n--O
or # is non-arithmetic and
H(x) ~ exp{~x}
with ~ such that
for a.a. [L]x E R+
fR exp{t/x}#(dx)= 1
+
The exchangeability argument, referred to above, to prove Theorem 3.7 is based essentially on observing initially that there is no loss of generality in taking the H of (20) to be a positive decreasing continuous function on R+ satisfying (20) with "almost all [L]" deleted and then noting mainly that a specialized version of Corollary 2.11 implies
H(x + ns) H(x + (n - 1)s)
H(x + (n + 1)s) H(x + ms)
n = 1 , 2 , . . . , s C supp[#l, x E R+ . This assures, for each s c supp[#1\{O }, the existence of x, E [0, s] such that inf H(x + s) _ H(xs + s) _ (_H(xs + ns)~ 1/n
xcR+ >_
(nO, H(x)
n(x ) \ H(xs) / + (rl + my)S).~ 1/n
> ( g ( y ) ~ 1/n (_H~ -~-s).'~~ -- \H(xs)) \ H(y) J , n = 1 , 2 , . . . , y E R + , (21)
747
where, for each y, my is a non-negative integer such that y + m y s >x,. Incidentally, it is a simple exercise to see inductively that the second identity in (21) or, equivalently, H(xs + ns)/H(x~ + (n - 1)s) = H(x~ + s)/H(x,) holds for all n in view of
H(xs + ns)
H(x, 4- (n - 1)s)
=~+(H(xs4-ns4-y)
~ H(x, 4- (n - 1 ) s 4- y) #(dy), \H(x~ + ( n - 1)s + y ) / H((Z f - - ( ; - - 1 - ~

n = 1,2,...
As (21) implies
H(y + H(y) _ H(x, + H(x,) ' y e R+
for an arbitrary s c supp[#]\{0}, the assertion of the theorem follows on appealing to Theorem 1 of Marsaglia and Tubilla (1975).
REMARKS 3.8.
(i) In the assertion (i) of Theorem 6.4.6 of Rao and Shanbhag (1994), (0, 1) should read (0, oc). (ii) As observed by Rao and Shanbhag (1996), both Theorems 3.5 and 3.6 hold even when 2 appearing in these results is replaced by any integer p' greater than or equal to 2. (iii) Theorem 3.7 in the case when # is a probability measure provides us with a somewhat different version of Theorem 3.1 for S = R+, and follows via a slightly simpler argument than that sketched above. In this case, it is easily verified that, for any 6 > 0, H6(x) = e ~x fix o~) e-6YH(y)dy, x C R+, is a positive real continuous function such that (13) with H6 in place o f h (and g with #({0}) < 1)) is met. If we assume that H6(x) = H6(x + s), x E R+, s E supp[#], does not hold, then it follows that there exist x0 c R+ and so E supp[#] such that lim i n f y ~ H~ (x0 + y) < H~ (x0) and H~(xo + so) > H6(xo). This implies, in view of the specialized version of Corollary 2.11, that there exist y R+ and n E No such that
1 > Ha(xo 4-y) > e_6~oH~(xo +nso)

H (xo) H (xo)
>_e
-6,0 [H~,(xo + so)'~"
~, H~-6(;) ) > 1 ,
leading us to a contradiction and hence easily to the result sought. This argument compares well with that used to prove Theorem 3.1. (iv) The argument referred to earlier to prove Theorem 3.5 via Theorem 3.7 with # as a probability measure (i.e the result in Remark 3.8 (iii)) involves a key observation that each of (i) and (ii) of Theorem 3.5 implies that is infinitely
748
divisible and that, without loss of generality, the Lbvy measure, v, corresponding to ~b is concentrated on Rp satisfying, for (i),
with v([x, oc)) = i v ( [ c ; l x , oo)), x E RP+, and, for (ii),

V([cllx, OO)) -~- V([C21X, OO)) = V([C31X,~)), X E RP+ ,
where ~ is a p-component vector with all its components equal to ec. Appealing to a result in Hardy (1967; p. 37) (i.e the result that i f p and m are integers greater than 1 such that none o f p l / m , . . . ,p(m-~)/m is an integer, then an equation of the form ao + alp 1/m + + am-lP (m-1)/m = 0, with a o , . . . , am-~ as rational numbers, holds only if a0 . . . . . am 1 = 0) in the first case and directly in the second case, one can hence conclude applying the specialized version of Theorem 3.7 that each of (i) and (ii) of Theorem 3.5 implies (iii) of Theorem 3.5. (For further details, see Rao and Shanbhag (1994).) That (iii) of Theorem 3.5 implies (i) and (ii) of the theorem follows easily. (v) Fosam and Shanbhag (1997a) have recently established a variant of Theorem 3.1 with measure # depending on x; this latter result gives certain characterizations based on relevation type equations and those met in topics such as queueing theory and point processes. The following corollary of Theorem 3.7, which is a variation of Shanbhag's (1977) lemma, is useful in solving certain characterization problems for discrete distributions such as Adapted Generalized Poly/t-Eggenberger distributions (described in Fosam and Shanbhag (1997b)), and Heine and Euler distributions. (Incidentally, the Adapted Generalized Polya Eggenberger distributions have Poisson, Lagrangian Poisson, and negative binomial distributions as special cases.) COROLLARY 3.9. Let {(v~,wn) :n = 0, 1,...} be a sequence of vectors with nonnegative real components such that vn 0 for at least one n, w0 < 1, and the largest common divisor of the set {n : wn > 0} is unity. Then oo Vm ~ ~-~ l)m+nWn~ m = 0 , 1 , . . . n=0 if and only if
v n = v o bn,
n=0,1,2,..,
and ~-'~Wn b ~ = l n=0
for some b > O. Among various characterizations that follow via Corollary 3.9, there is an important one relative to a damage model. Suppose (X, Y) is a two-component random vector with components that are non-negative integer-valued such that for each x with P { X = x} > 0
Exchangeability, functional equations, and characterizations P { Y = y]X = x} = - aybx-y -, cx
749
y=0,1,...,x
where {a,, : n = 0, 1,...} is a sequence of positive real numbers, {bn : n = 0, 1,...} is a sequence of non-negative real numbers such that b0 > 0 and the largest c o m m o n divisor of {n : bn > 0} is 1, and {cn : n = 0, 1,...} is the convolution of {an} and {bn}. Then, provided P { X = Y} < 1, the corollary implies that
P { Y = y} = P { r = y l X = Y},
if and only if
y -- O, 1 , . . .
P { X = x} (X Cx)L x,
x = O, 1,.. .
for some 2 > 0. In particular, this result provides us with an approach to arrive at a characterization for any discrete infinitely divisible distribution {gx : x = 0, 1,...} with #0, 0] > 0, (or decomposable distribution on {0, 1,...} of the given form {cx2x} with 2 > 0 and {cz} having the stated properties) based on a damage model with appropriate survival distribution (i.e. in the notation used above, appropriate conditional distribution of Y given X). Theorem 3.7 itself enables us to obtain numerous characterizations of probability distributions and stochastic processes, see, for example, Rao and Shanbhag (1986, 1994, 1996a, 1998). Included in these are certain characterizations of exponential and geometric distributions based either on order statistics or on record values, as well as the strong memoryless characterizations of the distributions referred to. More recently, Asadi et al. (1998) have used the theorem to arrive at generalized versions of these results, providing us with characterizations of generalized Pareto distributions and their discrete versions. One of the theorems of this latter paper is on identifying via Theorem 3.7, under some mild conditions, the distribution of X for which
E(@(c-f~f)X-Y [ X > Y , Y ) = c o n s t a n t ,
a.s. ,
(22)
where X and Y are independent non-negative r a n d o m variables such that cX + 1 > 0 and cY + 1 > 0 a.s. and 0 is monotonic (satisfying certain other conditions). This has led to some interesting characterizations based on successive record values as well as those based on successive order statistics. Debinska and Wesolowski (1998, 1999) have identified, under the assumption that X has a continuous distribution, the cases of the generalized Pareto distributions up to scale and location changes as those for which for fixed i,j with 1 _< i < j (in obvious notation) E(Rj - Ri[Ri) relative to record values is linear almost surely as well as those for which for fixed i,j with 1 < i < j, E(Xj:n - X/:~[X/:n) relative to order statistics is linear in X,.:n almost surely. They have also extended a certain result of Nagaraja (1988) on a characterization of the limit distributions relative to extremes in extreme value theory, to the case of nonadjacent record values. These results all follow from the next corollary of Theorem 3.7; see Debinska and Wesolowski (1999) for the relevant information. Note in particular that each of
750
the two conditions that we have mentioned explicitly above is equivalent to that F - l ( 1 - e-X), x C R+, where F is the distribution function of X, satisfies (23) for some constant c and measure # that is absolutely continuous with respect to Lebesgue measure. The corollary in question was used earlier to characterize a Yule Process by Rao and Shanbhag (1989, 1994) (see also Rao and Shanbhag (1998)); for essentially a direct p r o o f of the corollary see Rao and Shanbhag (1986) and a somewhat different version of the p r o o f involving a property of measures that are absolutely continuous with respect to Lebesgue measure, see Rao and Shanbhag (1994; Chapter 2). COROLLARY 3.10. Let H :R+ --+ R be a Borel measurable function satisfying
~ H(x+y)#(dy) = H ( x ) + C a.a [L] X C R+

+
(23)
where # is a a-finite measure with #({0}) < 1 and c is a constant, such that it is not a function which is identically equal to a constant a.e. ILl. Assume that H is either increasing a.e. [L] or decreasing a.e ILl. Then, either # is nonarithmetic and H is of the form for which /-/(x) f 7 + e(1 - exp{~/x})
/ y + fix
for a.a. [L]x if t/7 0 for a.a. [L]x if q = 0 ,
or # is arithmetic with span 2 for some 2 and H is of the form for which, for each n= 1,2,..., H(x) exp{n2t/} + c~'(1 - exp{n2q) for a.a. [L]x if q 0 for a.a. [L]x if ~ = 0 ,
H(x + n2) =
H(x) + fl'n
where c~, fl, c(, if, y are all constants and q is such that fR exp{qx}#(dx) = 1 .
+
Alzaid et al. (1988) have obtained an extended version of Theorem 3.7 introducing a certain argument based on ladder height measures relative to the measure # appearing in the following functional equation:
H(x) = ~ H(x + y)#(dy)
for a.a. [L]x c R+ ,
(24)
where # is a-finite with #({0}) < 1 and H : R -+ R+ is a locally integrable (w.r.t. Lebesgue measure) Borel measurable function such that it is different from a function that is equal to zero a.e. [L] on (cq oc) with c~= inf(supp[#]). In the argument in question there is also an application of Theorem 3.7. (A version of the argument appears in Rao and Shanbhag (1994; pp. 36-37).) The extension that is referred to is essentially as follows:
751
THEOREM 3.11. The weak ascending ladder height measure ~ and the descending ladder height measure p (in the notation of Alzaid et al. (1988)) relative to the measure # of (24) are Lebesgue-Stieltjes measures. Moreover, (24) implies
H(x) = f H(x + y)p(dy) + ~(x) exp{qx} J(- ~,o)
for a.a. [Llx E R+ , (25)
where ~ is a non-negative periodic function with 4(') = 4(" + s) for every s E supp[/~], and t/is a real number such that fR e x p { r / x } r ( d x ) . 1
+
(26)
(If t/satisfying (26) does not exist, we take 4 in (25) to be identically equal to zero and t/ to be an arbitrary real number; also we may choose 4 to be equal to a constant everywhere if/~ is non-arithmetic.) Each of the arguments used respectively by Alzaid et al. (1988) and Rao and Shanbhag (1994), to prove Theorem 3.11, involves mainly three stages. The first of these is essentially seeing that there is no loss of generality in assuming that #((-oc, 0)) > 0 and/~((0, oc)) > 0, and that the H is a positive decreasing continuous function and H defined by
t2I(x) = H(x) - [ H(x + y)p(dy), g(- oo,o)
x E R+
is a non-negative decreasing continuous function. The next stage is that of showing that p and ~ are Lebesgu~Stieltjes measures and H satisfies
I (x)
{
dR +
I2I(x+y)r(dy),
x e R+
with (r({0}) < 1 and) z such that it is non-arithmetic if # is non-arithmetic and it is arithmetic with span 2 if # is arithmetic with span 2. The remainder is just to apply Theorem 3.7 to (26) to arrive at (24). (For the sake of completeness, we may recall that for the # of (24), we define
p(o) = ~ ~ln((--OO,0) ~ O)
n--I
and
= n o) .
n=l
Here #1n and ]A2nare measures on R such that, for every Borel set B of R and every integer n >_ 1,
752
#,n(B)=#~({(xi,...,x~) cR ~:smER+, and
m=l,...,n-1,
sn EB})
#2n(B)
= #n({(Xl;...,xn)
E R n :sm ~ ( - - o o , 0 ) , m = 1 , . . . , n -
1, s~ EB})
( = #'z({(Xl,... ,Xn) E R " : s m Sn, m -~ 1 , . . . ,n -- 1, sn ~ B})) , where S m = xl + " " #i = #.)

+Xm, m _> 1 and #~ is the product measure I-Ii=l #i with
tl
REMARK 3.12 As there exist solutions to (24) such that fR H ( x + y ) z ( d y ) = oc, x E R+, (see Alzald et al. (1998)) one cannot give a proof for Theorem 311 directly based on the Wiener-Hopf factorization of #. In the case when (24) holds with "a.a. [L]x E R " in place of "a.a. [L]x E R+", one can indeed use techniques involving the factorization mentioned to prove the theorem; the restrictive version of the theorem gives, as its simple consequence, the following Corollary 3.13, which may also be viewed as a corollary to Deny's (1961) theorem. Alzaid et al. (1988) and Rao and Shanbhag (1994; p. 39), among others, provide us with the relevant details in this connection.
. it+
COROLLARY 3.13. Let

H(x) = JRH(x + y)#(dy)
for a.a. [L]x E R ,
where # is a a-finite measure on R satisfying #({0}) < 1 and H : R --+ R+ is a Borel measurable function that is locally integrable w.r.t. Lebesgue measure and different from a function that is equal to zero a.e. ILl. Then, either # is non-arithmetic and
H ( x ) = Cl exp{r/lX} + c2 exp{t/2x }
for a.a. [L]x E R ,
or # is arithmetic with some span 2 and

H ( x ) = ~l(x) exp{t/,x} + ~2(x) exp{t/2x}
for a.a. [L]x E R ,
with cl and c2 as non-negative real numbers, ~1 and ~2 as periodic non-negative Borel measurable functions having period )~, and r/i, i = 1,2 as real numbers such that j exp{r/ix}#(dx) = 1 . Theorem 3.12 and Corollary 3.13, or their minor variants, have applications in areas such as queueing theory, epidemiology, and branching processes; also, Corollary 3.13 subsumes the specialized version of the Choquet-Deny theorem, which plays a crucial role in renewal theory. Chapter 8 of Rao and Shanbhag (1994) provides us with some illustrative examples in this respect.
753
REMARKS 3.14. (i) Theorem 3.11, in conjunction with PerromFrobenius theorem relative to nonnegative matrices, plays an important role in studies of damage models. (See, for example, Rao and Shanbhag (1994; Chapter 7) for more details.) (ii) Rao and Shanbhag (1996b) have shown essentially using de Finetti's theorem that if S of Remark 2.12 is taken so as to have a certain additional structural condition met, then H has an integral representation as a weighted average of #-harmonic exponential functions. This result subsumes Deny's theorem relative to a countable group. (iii) As observed by Rao and Shanbhag (1996b), as a by-product of the results under (ii), it follows that if # is concentrated on a countable set, then each of Theorem 3.7 and Corollary 3.13 holds even when H is not assumed a priori to be locally integrable. (These modified results hold even without the a priori restriction that # be o--finite.) This extends Theorem 1 of Laczkovich (1986); for other relevant comments on Laczkovich (1986) and related results, see Rao and Shanbhag (1996b). As corollaries to Theorems 3.1 and 3.7, one can arrive at certain results on functional equations involving k functions that have applications in studies of stable distributions. In particular, if S is an Abelian topological semigroup, k is an integer > 2, hi, i=1,2,...,2k-1 are bounded non-negative Borel measurable functions satisfying hk+l = h i , . . . , h2k-1 =- hk 1, and #j are measures on S such that ~k=l #j is a subprobability measure, then it follows that
hi(x) = ~
Js hi+j-l(x+y)#j(dy),
x E S, i = 1 , 2 , . . . , k
(27)
if and only if either hi = 0 for i = 1 , 2 , . . . , k or hi(x) 0 ~ _ , #j(S) = 1, and
for some i and x,
hi+j-l(x + y) = hi(x)
for a.a. [#j]y E S for each x C S and i,j = 1 , . . . , k
(2s)
This result is immediate from Theorem 3.1 on writing (27) in terms of the function h on No S such that h(n,x) = hi(x) for each x c S, n of the form rk + i - 1 with r as a non-negative integer, and i = 1 , 2 , . . . , k. In the case when S = R+ and ~j~-i #j(S) = 1, with the assumption that hls are bounded replaced by that h'is are ILl-locally integrable or that ~jk= 1 #j is concentrated on a countable set, one gets that (27) with "x E S" replaced by "for a.a. [Llx E S" is equivalent to (28) with "each x E S" replaced by "a.a. [Llx E S". This follows on noting, among other things, the functional equation satisfied by ( ~ - 1 hi) (.), in view of the functional equation in the assertion, and identifying its solution appealing to Theorem 3.7 (or the specialized version referred to in Remark 3.14, (iii)). Some corollaries to the latter result and their applications in characterization theory relative to stable
754
C.R. Rao and D.N. Shanbhag
distributions, with k = 2, have appeared in the literature, see, for example, Kagan et al. (1973) (especially Theorem 5.4.1, a key result, in it) and Ramachandran and Lau (1991), while the former result for general k is given in Rao and Shanbhag (1991). Applying the results referred to in Remark 3.14, (ii), one can obtain certain theorems on the Hausdorff moment problem, and a multivariate extension of Shanbhag's (1977) theorem on a damage model. In particular, to spell out the result, in the latter case, suppose that (X, _Y) is a random vector with X and _Yas pcomponent vectors having non-negative integer-valued components such that P{Z < X} = 1 (where the inequality is to be understood as component-wise and a version of the conditional distribution of _Y given X is given by (in obvious notation) P{Y =y__]X = x} -
ay_bx-y_
y/=0,1,...,
xi,
xi=0,1,...,
i= 1,...,p ,
where {ax_} is a sequence of positive real numbers, {b~} is a sequence of nonnegative real numbers with b_0 > 0 and {Cx} is the convolution of {a~} and {b~_}. Then, provided 0 < P{_Y = X} < 1, under some mild assumption (which is met if we have, for example, P{X = 0_} < 1 and b(1,0,...,0),b(0,1,0,...,0),... ,b(0,0,...,1) > 0), it follows that P{Y=y_}=P{Y=y_IX=y} y_{0,1,...}P
if and only if {P{X = x_}/cu_} is a constant multiple of a certain moment sequence; see Rao and Shanbhag (1994; Chapter 7) for a more precise statement of the result and also for its corollaries. Rao and Shanbhag (2000) have shown that many of the general results on solutions to integral equations appearing in Chapter 3 of Rao and Shanbhag (1994), providing variations to Deny's (1961) theorem, can be arrived at applying simple versions of de Finetti's theorem, where the exchangeable random variables take only a finite number of values. Solutions or partial solutions can be obtained to the integral equations relative to Abelian topological semigroups, under some mild conditions, with this approach. One can use the approach even when de Finetti's theorem itself is not valid for the semigroup. We will not discuss this in the present contribution. However, we note that among various results that are corollaries to the general results which can be arrived at via the approach mentioned are Bernstein's theorem, Bochner's theorem, and their discrete versions relative to the Hausdorff moment problem. One can also obtain, as corollaries to the general results, an extended version of Theorem 1 of Laczkovich (1986) and the well known result on the spectral representation for the Laplace transform of an infinitely divisible distribution concentrated on Rp, as well as several characterizations for multivariate probability distributions. (The result on the spectral representation referred to here is that a probability distribution F on R~ is infinitely divisible if and only if (in obvious notation)
Exchangeability,functionalequations,and characterizations
f (exp{-(0,x_)})F(dx__) = exp 0>0,
755
where c >_ 0, the inequalities are co-ordinate wise, and/x is a measure on Rp with /~({0_}) = 0 and fR~+I[x_ll/(1 + I1~_[I)~(~) < ~ - )
4. Characterizations of mixtures of probability distributions de Finetti's theorem or its variations enable us to obtain characterizations of mixtures of certain well known probability distributions with applications in areas such as multivariate analysis and Bayesian inference. In this section, we briefly touch upon some recent contributions dealing with problems of the type referred to. Suppose XI,X2,... ,X n are n(>_3) positive random variables defined on a probability space and r is an integer satisfying 2 < r < n. Denote by Xl:n,..., X,:, the order statistics (in ascending order) relative to X1,...,Xn and X0:n = 0, and define
Oi,n=(n-i4-1)(Xi:n-Xi_l:n),
i
i = 1,2,...,n,
Si,~:ZDj#
j=l
i:
1,2,...,
and
= (Sl.n S2,n
Sr-l,n)
Using essentially a version of an argument based on de Finetti's theorem to arrive at corollaries or variations of D e w ' s (1961) theorem, met or implied in Sections 2 and 3 of the present paper, Rao and Shanbhag (1997) have established the following theorem: THEOREM 4.1. Let r and n be integers such that 5 < r < n and let (~2, C, P) be a probability space. Let X1,X2,... ,Xn be positive random variables defined on the space (~2, g , P ) and Y be a sub-a-field of E such that conditional upon Y , the random variables X1,X2,... ,Xn are i.i.d. Then (in the notation appearing above), W,,.,n is distributed as the vector of order statistics relative to a random sample of size r - 1 from the uniform distribution on (0, 1) if and only if there exists a positive ~-measurable random variable A such that, conditional upon it, X1 is exponentially distributed with m e a n A - 1 , where the exponential distribution is a version of the conditional distribution of)(1 given ~ . The "if" part of Theorem 4.1 follows easily, while to prove the "only if'' part of the theorem, Rao and Shanbhag (1997) observe that the distribution of W~,~
756
X~:,, )22:. )(3...,xT,,) X4:. and show, essentially using an argument of determines that of (2572,x5:-7,~:, the type based on de Finetti's theorem for solving an integrated functional equation, that this implies that there exists a version of the conditional distribution of X1 given Y , such that it is a continuous distribution concentrated on (0, co) with survivor function satisfying the Cauchy functional equation on [0, oo). That the "only if'' part of theorem holds is clear since the survivor function satisfying the functional equation exists only if (in obvious notation) /~CV]~') = ( F ( I J ~ ) ) y, y E (0, OO)
with ,~(ll.~- ) > 0. In the case when X(s are i.i.d., Theorem 4.1 reduces to a characterization of an exponential distribution. This addresses partially the problem raised by a conjecture of Dufour. For the details and the literature relevant to the conjecture, see Leslie and van Eeeden (1993) and Rao and Shanbhag (1997). REMARKS 4.2. (i) In the case when/(1,X2,... ,X, are the first n numbers of an infinite sequence { X m : m = 1,2,...} of exchangeable random variables, then in view of Theorem 2.1, the assertion of Theorem 4.1 holds with ~" as the tail or invariant a-field relative to {Xm}. (ii) From the proof of Theorem 4.1, as observed by Rao and Shanbhag (1997), it follows that the theorem holds if the portion " ~ , , is distributed as ... the uniform distribution on (0,1)" is replaced by "(~xr: x':" " ' " ' x ~ ) has the distribution as in the case when Xj:n, j = 1 , 2 , . . . are order statistics relative to a random sample of size n from an exponential distribution." (iii) There are several equivalent ways of stating the assertion of Theorem 4.1. In particular, the result can be restated with "where the exponential ... given @ " replaced by "and X/s are i.i.d." or by "independently of X2" or by "and so also is 2min{X1,Xg}". (Incidentally, the original statement of the result in Rao and Shanbhag (1997) requires a slight amendment of the type that has appeared in the statement of Theorem 4.1.) From the work of Schoenberg (1938) and subsequent authors it is known that an n-component random vector X has an elliptically symmetric distribution, or, for short, an e.s.d, if and only if it has the representation X =d TUC , (29)
where C is an n x n real matrix such that C'C = 2, U is an n-component random vector that is uniformly distributed on {x < R" : ][x[] = 1} and T is a nonnegative real random variable independent of U. It is implicit here that the distribution of X in question depends on C only through X. An e.s.d with X = I is referred to as a spherically symmetric or, for short, as an s.s.d. Various characterizations of normal distributions in the class of e.s.d.'s or its variants have appeared in Letac (1981), Alzaid et al. (1990), Rao and Shanbhag (1989;
757
1994) and other places. In particular, Rao and Shanbhag have given arguments based on Deny's (1961) theorem to arrive at a result of Letac (1981) and an improved version of a related result of Zinger (1956). Letac's (1981) result referred to is that if X1, X2 and X3 are three independent real random variables such that P { X j . = 0 } = 0 , j = 1 , 2 , 3 , then (R IX1,R-1X2,R-1X3) , where R = (X? @ X 2 @X32)1/2 is uniformly distributed on the sphere {x c R3: IIx[] = 1} if and only if Xjs are all distributed as N(0, o-2) for some 0.2. (Indeed, this is a slightly simplified version of the relevant result in Letac (1981).) As implied above, this result gives as a corollary a result of Zinger stated below. Suppose X1,...,Xn are independent identically distributed continuous real random variables. Define 2 = ~1 ~ i = n l X/ and S 2 = ~ i n = 1 (X, i - 2 ) 2. Then, the result of Zinger is that for n > 6, the vector
Y= ~ ,'..,
is uniformly distributed on the ( n - 2)-dimensional sphere ~b = {x E R": ~ i n l Xi = O, IlXll = 1} if and only ifX[s are normal. Zinger's result clearly does not hold for n = 2 and it has not been settled for n = 3 and 4. Rao and Shanbhag (1994, Chapter 6) have shown that the result holds for n = 5. (The validity of the result for n = 5 implies clearly its validity for n _> 5, giving an improved version of the Zinger result; also note that X/s are assumed to be continuous to have Y well defined almost surely.) For the details of the arguments of Rao and Shanbhag used to arrive at the results of Letac and Zinger and of the literature addressing certain other problems on e.s.d.'s, we refer the reader to Rao and Shanbhag (1994, Chapter 6). We do not intend to revisit these in the present contribution. Freedman (1963), Kingman (1972), Smith (1981), Ressel (1985) and Alzaid et al. (1990), among others, have characterized mixtures of some well known distributions via certain properties involving conditional distributions or expectations based on 0(1,... ,Xm), where X1,... ,Xm are exchangeable (meeting, possibly, some further restrictions). In particular, Alzaid et al. (1990) have established characterizations based on regression properties; we briefly discuss these or their variations in what follows: Let {(O~1), ~b~ 1),~'7 : ~ E r } be a countable family of vectors of realvalued Borel measurable functions on R ~ such that, for every probability measure on the Borel 0.-field of R ~ for which the projection mappings are i.i.d, and for every y E r , (O~l),qS~1)) and (0,(2), ~b~(2)) are i.i.d., and let C be a class of probability distributions on the real line. In the notation introduced, one can state the following theorem:
~//(2)''~7'6(2)~'
THEOREM 4.3. Suppose any infinite sequence {Xn : n = 1,2,.. ,} of i.i.d, random variables satisfies the condition that
= o, c r a.s. , (30)
758
where X = ( X 1 , X 2 , . . . ) , only if the distribution of X1 is a member of C. Then if {Yn : n = 1,2,...} is an exchangeable sequence of random variables, the equation
E { ~ I ) ( Y ) ( @ 2 ) ( y ) I @ ~ I ) ( Y ) - 0~2)(Y)} = 0,
]J E / " a.s. ,
(31)
where Y = (Y1, Y2,...) is valid only if there exists a version of the conditional distribution of Y1 given Y , the tail a-field relative to {Y,, : n = 1,2,...}, such that each of the distributions in the version is a member of C. To see the validity of Theorem 4.3, one may use an argument involving a key step that, in view of Corollary 1.1.3 in Rao and Shanbhag (1994), (31) implies
E,reit(O~ll(~-~'~2~(r))cb(1)(lfi,b(2)(Y)l
= 0,
7 E F,
-oc
< t < oc
(32)
where the left hand side of the identity is to be understood as one that is well defined and equal to the right hand side of the identity. (Note that (32) assumes implicity that E{Iq~I)(Y)0~2)(Y)I } < oe for each 7 E C.) On appealing to Theorem 2.1 in conjunction with (31), it follows that ~b~l)(Y) are integrable for all 7 E F and IE{eit(l~l)(Y))~b~l)(Y)t~-}l2 =
0,
]1 E F, --e~ < t < oc ,
(33)
and hence that there exists a regular conditional probability corresponding to Y given Y such that relative to it, the co-ordinate mappings are i.i.d., and
E{eitOp~ll(Y))d?~l)(Y)]J} = O, Y C F, -cx~ < t < oo ,
(34)
where it is implicit that E{t~b~)(Y)[13-- } < ec, with the expectation computed through the regular probability. In view of the corollary from Rao and Shanbhag (1994) referred to above, we have that the present theorem holds.
REMARKS 4.4.
(i) As a corollary to the Theorem 4.3, we have the version of the theorem given by Alzaid, Rao and Shanbhag (1990). Also, in view of the version of de Finetti's theorem appearing in Olshen (1973, 1974), which we have referred to in Section 2, under the assumption (and the notation as) in Theorem 4.3, (31) is valid if there exists a Y-measurable random variable V such that, conditional upon V, Yis are i.i.d, random variables with (common) distribution as a member of C. (ii) Suppose C is such that a certain collection of characteristics such as moments are defined for each of its members and it identifies the respective member. For example, if C is a class of normal distributions that are symmetric about the origin, then each of its members is determined by the corresponding variance. Similarly, if C is the class of all normal distributions, then each of its members is determined by its mean and variance. In this case, the result of
759
Remark 4.4, (i), holds with a collection of random variables in place of a random variable V. (iii) If 6~ 1), 6~2) for each 7 E C in the above theorem are both non-negative or 1)(Y) - I//~2)(Y) replaced both non-positive, then the theorem is also valid with 1/t~ by 6(1)(]0 -t- 6(2)(]0 for some or all 7 E F (iv) If Y 7 are independent of 7 and C is finite, then the theorem remains valid if (31) is replaced by
6/I1, ~;23
~
F{q~l) (]0q~2) (]016~l) (]0 - 6~2)(]0} = 0
a.s.
with ~b~ 1)(]0qb~2)(]0 integrable for all ?, where 6 (') and 0 (2) denote the functions and that are independent of y. Additionally, if we have 0 (1) and 0 (2) both non-negative or both non-positive, then the result in question remains valid when @(1)(]0 -- 6(2)(]0 in (31) is replaced by 6(1)(]0 + 6(2)(]0 . (v) If the sequence {Yn:n = 1,2,...} of exchangeable random variables is taken such that, for two distinct real values of O, E{e(0~'/(r)+0~2/Ct?)]q~l)(]0~b~2)(]0]} < oc, ~E F ,
then the above theorem remains valid with the left hand side of (30) replaced by the means of conditional distributions of ~b~l)(J 0 given 6~I)(x) for y c r and simultaneously the left-hand side of (31) replaced by the means of the conditional distributions of ~@1)(]0 ~2)( ]0 given 6~1)(Y) + 6~2)( ]0 for y E r. (If additionally F is finite and 6~ 1) and 6~2) are independent ofT, then in place of modified (31), we can even take the condition that the sum of the expected values involved in the condition be equal to zero a.s.) (vi) If it is assumed that E{lqS~')(]0~2)(Y)t } < 0(3 for all 7 C F, then the result of Theorem 4.3 holds with "only if" in two places replaced by "if and only if"; Remarks 4.4 (i), (ii) and (iii) apply to this version of the theorem as well. That the relevant versions of the results in Remarks 4.4 (iv) and (v) (obtained replacing "only if" by "if and only if" in the implicit statement) hold is obvious. (vii) In the light of what appears in Alzaid et al. (1990) and Section 9.4 of Rao and Shanbhag (1994), one can obtain further generalizations and variations of Theorem 4.3. Kingman (1972) showed that if {Yn : n = 1,2,...} is a sequence of exchangeable random variables such that (Y1,...,Yd) has a spherically symmetric distribution, then there exists a non-negative random variable V, such that conditional on V, the Yns are independent N(0, V) random variables. Smith (1981) extended Kingman's ideas to show essentially that if {Yn :n = 1,2,...} is a sequence of exchangeable variables such that (Y1,..., Ys) has a central spherically symmetric distribution, then there exist random variables M and V, with V > 0 such that conditional on (M, V), the Yns are independent N(M, V) random variables. Alzaid et al. (1990) have extended these results using essentially a result in Remark 4.4 (vi) and have arrived at certain characterizations of mixtures of
760
distributions of i.i.d, normal variables, based on moments of certain conditional distributions. F o r the details of these latter results, we refer the reader to Alzaid et al. (1990); for further literature that is linked with Kingman's (1972) result, consult either Kingman (1978) or Aldous (1985). Motivated by T h e o r e m 9.2.1 of Rao and Shanbhag (1994), which is a version of the collection of results in Laha and Lukacs, we can state a further corollary to T h e o r e m 4.3 (with the modification mentioned in R e m a r k 4.4, (vi)). T o do this, let {Yn : n = 1 , 2 , . . . } be an exchangeable sequence of r a n d o m variables with E ( Y ( Y 2) < oc. The corollary identifies the cases for which the following equation holds: E{(Y12 - aY1Y2 - bY1 - c)
(Y32 - a Y 3 Y 4 - bY3 - c)lY1 q- Y2 Y3 Y4} = 0
a.s.
(35)
with a, b, c as real numbers. We can now state the corollary; the result is obvious in view of T h e o r e m 4.3. COROLLARY 4.5. {Yn} referred to above satisfies (35) if and only if there exist two tail-~-field measurable real r a n d o m variables U and V, with U taking values only in { - 1 , 1}, such that conditionally u p o n (U, V), Y//s are i.i.d and, at each point (U, V) at which Y1 is not degcnerable satisfying (1 - a ) Y ( - bY1 - c = 0, one o f the following is valid. (ii) a = 1, b 0 and (1/b)(Y1 + ( c / b ) ) has Poisson(V) with V > 0 (iii) a > 1, 4 c ( a - 1) = b 2 and U(Y1 + ( b / 2 ( a - 1))) is gamma with index (a - 1) -1 and scale V with V > 0 (i.e., with mean (a - 1) -1V -I and variance (a - 1)-' V-2). (iv) a < 1, (1 - a) -1 is an integer and there exists a number 6 > 0 such that 4c(a-1)l=b2-62, and a-l(Yl+((b-a)/2(a-1))) has the binomial ((1 - a ) - , V) distribution, with V c (0, 1). (v) a > 1 and there exists a number 6 > 0 such that 4 c ( a - 1 ) = b 2-32 , a n d U a - I ( Y 1 + ((b - U~,)/2(a - 1))) h a s the negative binomial ((a - 1) -1, V) distribution, with V C (0, 1). (vi) a > 1 and there exists a number a > 0 such that 4c(a - 1) = b 2 q- 6 2, and 23 -1(Y1 + ( b / 2 ( a - 1))) has the distribution that is absolutely continuous with respect to Lebesgue measure, with density
f(x) = (cosV)P--7;~e F , x ER ,
where p = (a - 1) -1, with V E ( - ~ , ~ ) . REMARK 4.6. In the light of Theorem 2 in F o s a m and Shanbhag (1997b) and several results in Rao and Shanbhag (1994, 1998) and Asadi et al. (1998) on characterizations based on regression properties involving order statistics, one can arrive at further characterizations of the type in Corollary 4.5 or its variations. In particular, in view of the theorem in Fosam and Shanbhag (1997b), one
761
can obtain a variation of Corollary 4.5, involving, among others, certain mixtures of wet period distributions in storage theory. We do not intend to discuss these results here.
Acknowledgement
This work is supported by the US Army Research Grant DAA HO4-96-1-0082.
References
Aldous, D. J. (1985). Exchangeability and related topics. Lecture Notes in Math., 1117, Springer, Berlin, pp. 1-198. Alzaid, A. A., K. S. Lau, C. R. Rao and D. N. Shanbhag (1988). Solution of Deny's convolution equation restricted to a half line via a random walk approach. J. Mult. Anal. 24, 309-329. Alzaid, A. A., C. R. Rao and D. N. Shanbhag (1987). Solution of the integrated Cauchy equation using exchangeability. Sankhya, Set. A 49, 189 194. Alzaid, A. A., C. R. Rao and D. N. Shanbhag (1990). Elliptical symmetry and exchangeability with characterizations. J. Mult. Anal. 33, 1-16. Asadi, M., C. R. Rao and D. N. Shanbhag (1998). Some unified characterization results on the generalized Pareto distribution. Center for Multivariate Analysis, Pennsylvania State University, USA, Technical Report. Ash, R. B. (1972). Real Analysis and Probability. Academic Press, New York. Choquet, G. and J. Deny (1960). Sur l'equation de convolution # = #*~r. Acad. Sci. Paris 250, 799 801. Chow, Y. S. and H. Teicher (1979). Probability Theory, Independence, Interchangeability, Martingales. Springer, New York. Davies, P. L. and D. N. Shanbhag (1987). A generalization of a theorem of Deny with applications in characterization theory. J. Math. Oxford 38(2), 13-34. Debinska, A. and J. Wesolowski (1998). Linearity of regression for non-adjacent order statistics. Metrika 45, 215522. Debinska, A. and J. Wesolowski (1999). Linearity of regression for non-adjacent record values. J. Stat. Plan. Inf. (submitted for publication). Deny, J. (1961). Sur l'equation de convolution # = #*or. Semin. Theory Potent (Ed., M. Brelot), Fac. Sci. Paris, 1959 1960, 4e ann+e. Diaconis, P. and D. Freedman (1980a). Finite exchangeable sequences. Ann. Prob. 8, 745-764. Diaconis, P. and D. Freedman (1980b). de Finetti's theorem for Markov chains. Ann. Prob. 8, 115 130. Dubins, L. E. and D. Freedman (1979). Exchangeable sequences need not be mixtures of independent identically distributed random variables. Zeitschrift fur Wahrscheinlichkeitstheorie Verw. Gebiete 48, 115 132. Dynkin, E. B. (1978). Sufficient statistics and extreme points. Ann. Prob. 6, 705 730. Feller, W. (1966). An Introduction to Probability Theory and its Applications. Vol. 2. J. Wiley and Sons, New York. Fosam, E. B. and D. N. Shanbhag (1997a). Variants of the Choquet Deny theorem with applications. Y. Appl. Prob. 34, 101 106. Fosam, E. B. and D. N. Shanbhag (1997b). An extended Laha-Lukacs characterization result based on a regression property. J. Stat. Plan. Inf. 63, 173 186. Freedman, D. A. (1963). Invariants under mixing which generalize de Finetti's theorem. Ann. Math. Statist. 34, 1194-1216. Hardy, G. H. (1967). A first course of pure mathematics. Cambridge University Press.
762
Hewitt, E. and L. J. Savage (1955). Symmetric measures on Cartesian products. Trans. Amer. Math. Soc. 80, 470-501. Kagan, A. M., Yu. V. Linnik and C. R. Rao (1973). Characterization Problems in Mathematical Statistics. Wiley, New York. Kingman, J. F. C. (1972). On random sequences with spherical symmetry. Biometrika 59, 492-494. Kingman, J. F. C. (1978). Uses of exchangeability. Ann. Prob. 6, 183 197. Laczkovich, M. (1986). Non-negative measurable solutions of difference equations. J. London Math. Soe. 34(2), 139-147. Lau, K. S. and C. R. Rao (1982). Integrated Cauchy functional equation and characterizations of the exponential law. Sankhya, Ser A 44, 72-90. Lau, K. S. and W. B. Zeng (1990). The convolution equation of Choquet and Deny on semigroups. Studia Math. 97, 115-135. Leslie, J. R. and C. van Eeden (1993). On a characterization of the exponential distribution based on a type 2 right censored sample. Ann Star. 21, 164~1647. Letac, G. (1981). Isotropy and sphericity: some characterizations of the normal distribution. Ann. Stat. 9, 408-417. Lo+ve, M. (1963). Probability Theory. 3rd edn., Van Nostrand, New York. Marsaglia, G. and A. Tubilla (1975). A note on the lack of memory property of the exponential distribution. Ann. Prob. 3, 352-354. Matthews, J. P. (1963). A combinatorial proof of the distribution of the tinae to first emptiness of a dam with Markovian inputs. J. R. Statist. Soc. Ser. B 34, 263 267. Mott, J. L. (1963). The distribution of the time-to-emptiness of a discrete dam under steady demand. J. R. Statist. Soc. Ser. B 25, 137-139. Nagaraja, H. N. (1988). Some characterizations of continuous distributions based on regressions of adjacent order statistics and record values. Sankhya, Ser. A 50, 70-73. Olshen, R. A. (1973, 1974). A note on exchangeable sequences. Zeitschrift .)Cur Wahrscheinlichkeitstheorie Verw. Gebiete 28, 317-321. Phelps, R. R. (1966). Lecture Notes on Choquet's Theorem. Van Nostrand, Princeton. Prabhu, N. U. and N. U. Bhat (1963). Some first passage problems and their applications to queues. Sankhya, Ser A 25, 281-292. Prabhu, N. U. (1965). Queues and lnventories: A Study of their Basic Stochastic Processes. J. Wiley and Sons, Inc., New York. Ramachandaran, B. (1984). Renewal-type equations on Z. Sankhya. A 46, 319-325. Ramachandaran, B. and K. S. Lau (1991). Functional Equations in Probability Theory. Academic Press, New York. Rao, C. R. and D. N. Shanbhag (1986). Recent results on characterization of probability distributions: a unified approach through extensions of Deny's theorem. Adv. Appl. Prob. 18, 660-678. Rao, C. R. and D. N. Shanbhag (1989). Recent advances on the integrated Cauchy functional equation and related results in applied probability. In Probability and Statistics (Papers in honour of S. Karlin) (Eds., T. W. Anderson, et al.), Academic Press, New York. Rao, C. R. and D. N. Shanbhag (1991). An elementary proof for an extended version of the ChoquetDeny theorem. J. Multivar. Anal. 38, 141 148. Rao, C. R. and D. N. Shanbhag (1994). Choquet Deny Type Functional Equations with Applications to Stochastic Models. J. Wiley and Sons, Chichester. Rao, C. R. and D. N. Shanbhag (1996a). A note on a characteristic property based on order statistics. Proc. Amer. Math. Soc. 124(1), 299-302. Rao, C. R. and D. N. Shanbhag (1996b). Further versions of the convolution equation. University of Pennsylvania State (Center for Multi. Anal) Tech Report. (To appear in V. Sukhatme memorial volume). Rao, C. R. and D. N. Shanbhag (1997). Extensions of a characterization of an exponential distribution based on a censored ordered sample. In Advances in the Theory and Practice of Statistics. (Volume in honor of S. Kotz) (Eds., N. L. Johnson and N. Balakrishnan), pp. 431-440, Wiley, New York.
763
Rao, C. R. and D. N. Shanbhag (1998). Recent approaches to characterizations based on order statistics and record values. In Handbook of Statistics, ol. 16 (Eds., N. Balakrishnan and C. R. Rao), Chapter 8, pp. 231-256, Elsevier, Amsterdam. Rao, C. R. and D. N. Shanbhag (2000). Further contributions to the integrated Cauchy functional equation. (Center for Multivariate Analysis) Tech. Report. Ressel, P. (1985). de Finetti type theorems: an analytical approach. Ann. Prob. 13, 898 922. Schoenberg, I. J. (1938). Metric spaces and positive definite functions. Trans. Amer. Math. Soc. 44, 522-536. Shanbhag, D. N. (1973). On the first emptiness of dams with Markovian inputs. J. R. Stat. Soc. Ser B 35, 501-506. Shanbhag, D. N. (1977). An extension of the Rao-Rubin characterization of the Poisson distribution. J. Appl. Prob. 14(3), 640-646. Shanbhag, D. N. (1991). Extended versions of Deny's theorem via de Finetti's theorem. Comput. Statist. Data Anal. 12, 115-126. Smith, A. F. M. (1981). On random sequences with censored spherical symmetry. J. R. Stat. Soc. Ser. B 43, 208-209. Sz6kely, G. J. and W. Zeng (1990). The Choquet-Deny convolution equation/~ = ~ * cr for probability measures on Abelian semigroups. J. Theor. Prob. 3, 361 365. Zeng, W. (1992). A note on stability of multivariate distributions. J. Math. Res. Expan. 2, 171 175. Zinger, A. A. (1956). On a problem of A. N. Kolmogorov. Vestnik Leningrad University 1, 53-56.
")'~
z ~ ..,
Martingales and Some Applications
M. M. Rao
1. I n t r o d u c t i o n
As in the case of many other parts of Probability Theory, martingales also have their origins in certain games of chance. A type of game which has a byline stating "double the bets after a loss and drop out after a win" is termed a martingale. Such games seem to have been quite familiar to the general public in France which is described by the following item. Depicting a somewhat non cordial relationship in the early 1960's, between Prince Ranier of Monaco and General De Gaul, the French President at the time, there was a cartoon in a leading French news paper (reprinted by the New York Times) with a subtitle, "the prince plays a martingale with the general". To motivate the subject and to explain the game of doubling strategy, (and to appreciate the above anecdote and the cartoon) one may connect it to the following streamlined mathematical version: Suppose Xn denotes the outcome of the nth game where Xn = 1 if it is a win, and = - 1 if it is a loss, with probabilities 0 < P[Xn = 1] = p = 1 - P [ X n = - 1 ] = 1 - q < 1. The game structure indicates that the X~ are also independent so that the nth toss (or play) does not depend on the previous ones. Suppose the player bets an dollars on the nth game, so that the total win Sn at that play is given by
Sn = a~X1 + " . + anXn = Sn-1 + anXn
(1)
where X0 = 0. Instead of the an being constants, suppose they are also random variables which are determined basing only on the preceding trials. For instance, in the "doubling the bets" game, let al = 1 and for n > 1, if the first n - 1 are losses with the nth a win, then 2~-1, 0, ifX1 . . . . . Xn 1 = - 1 otherwise ,
an =
SO that the player's total loss at the nth game is ~ = 1 2k 1 = 2n _ 1. Now if the (n+l)th play is a win, then the total gain is Sn+l = S ~ + a n + l - 1 = - ( 2 n - 1) + 2n = 1 since X,+I = 1. 765
766
M. M. Rao
Suppose now that r denotes the first time the player wins, i.e., ~ = inf{n _> 1 :S~ = 1}. Thus
Pit = n] = P[X1 = - 1 , . . . ,Xn_ 1 = -1,X~ = l]

= p ( 1 -p)~-~ , and
OO O(3
(2)
n=l
n=l
so that the win results in a finite (but not necessarily bounded) time with probability 1. Note that the expected value, E(Xn) = p - q, n _> 1, and the game is called fair i f p = q(= ) and in this case E(Sn) = E(X,) = 0, n > 0; it is favorable 2 (to the player) i f p > and unfavorable (to the player favorable to the gambling house!) i f p < . Also in a fair game not only P[S~ = 1] = 1 and hence E(S~) = 1, but the player starting with the initial capital X0 = 0 can increase the expected wealth to 1 although E(Sn) = 0 = E(So) = E(Xo). This system of doubling bets (fairly) is the popular martingale. Since the time to win, namely r, is finite but not bounded, one needs a large (possibly unbounded) capital as well as time. Thus it is physically unrealizable. Let us note parenthetically that the S, of (1) is not a sum of independent Bernoulli random variables and hence the classical Wald identity E(S~)= E(r)E(X1), established after Theorem 3.6 below, does not hold. Also although {S~,n >_ 1} is a martingale (to be defined precisely below) the "optionally stopped" new 'sequence' {X~,X0} need not be a martingale, a situation that is covered in the ensuing discussion. This illustration raises a host of problems of both practical as well as theoretical interest and importance. To begin with, one should define not only a martingale or a fair game, but also present its close relatives the favorable and unfavorable ones. Further the processes should not be restricted to sequences or discrete times. In fact, the present day studies in Probability can be broadly (but not exclusively) divided into four categories: (i) processes (or sequences) consisting of independent random variables, (ii) stationary (harmonizable and related second order) processes, (iii)Markovian types, and (iv)martingales. Evidently, a given process can belong to more than one category, and the methods of one part are often used in others for solving such problems. The purpose of the following account is to present a general view of martingales and some of its extensions as well as applications. Thus the next section is devoted to a precise description of martingale concepts, the fundamental inequalities and some of their extensions to be used in applications later. Then Section 3 contains a discussion of the basic convergence theorems. In both these accounts, the discrete as well as continuous parameter versions are considered, including the optional stopping (or skipping) for the process as indicated in the above illustration. Most of Section 4 is devoted to continuous parameter (semi)martingales which will be used in various applica-
Martingales and some applications
767
tions in the rest of the article. These consist of an extended (semi)martingale calculus, including (semi)martingale integrals. Section 5 discusses some applications to likelihood ratios, used in stochastic inference, and Section 6 contains an account of exponential (semi)martingales which arise typically as solutions of linear stochastic equations. The next set of applications consists of an extended discussion of financial mathematical models where (exponential) martingale methods play an effective role. These applications also show the importance of Girsanov's theorem in this analysis. Finally in the last section some remarks on multivariate and multiparameter versions of some of these results are indicated. To appreciate the subject for several of the assertions, (an outline o1" some times full) proofs are given with explanatory remarks.
2. Martingale concepts and inequalities

Let {X,, n >_ 1} be a sequence of independent mean zero random variables (relative to the underlying probability space (O, Z,P)), and set S~ = ~i=1 X~ i. Then one has, using the properties of (conditional expectations and the standard notation, E(S~+I IX1,... ,X~) = E(S~ + X,+I IX1, .. ,X~) = E(SN + Xn+l IS1,..., Sn) = s. + E(Xn+1 IXl, .. =S,+0 ,
= s, + E(X,+I)
(3)
with probability one, since X,,+I is unaffected by the conditions on the previous variables. This serves as a good motivation, and is generalized for a not necessarily independent system taking (3) as a definition for the sequence {S,, n > 1}. Before stating the general concept, let us also motivate with a gambling situation. Thus suppose that Z1, Z2,... are the fortunes of a gambler at plays 1,2,... where only the knowledge of the present and the past games is available (and no clairvoyance). Assuming that the Z, have finite means, the game is said to be fair if
E(Zn+ Z,) = Z ,
(4)
with probability one, and it is said to be favorable (to the player) if '=' is replaced by '>_' in (4) [it is unfavorable (to the player but favorable to the house) if '=' is replaced by '_<']. A complete knowledge of X1,...,Xn is the statement that one has control (or knowledge) on all the values which these random variables can take. This means mathematically that the a-algebra g n = cr(X1,... ,Xn), generated by these variables is known. With such a notion, the concept can be stated quite generally as follows. Let {Xt, t E I} be an indexed set of random variables on the basic probability space (O, N,P), modeling the experiment, where the index set I is (perhaps partially) ordered, i.e., for some pairs a, b c I, a < b is defined and '_<' is a partial order on I, and it is directed if also for each finite subset of I there is an
768
M. M. Rao
element (in I) dominating this subset in the above order. If I c R, the reals, then I has the usual order which is linear. Let ~ t = a(Xs,s <_ t). The family {Xt, ~ t , t E I} is called adapted, meaning each Xt is ~t-measurable and t _< t r implies Y t c @e. With these notions the desired concept about martingale processes can be presented as follows: 1. DEFINITION. Let { X t , ~ t , t E I} be an adapted integrable family of (real) random variables on the basic triple (Q, S,P). Then it is called a (sub or super) martingale if for each tl, t2 ~ 1, tl _< t2 one has:
E(Yg 2 I~tl ) = ( ~ o r ~ ) ~ 1 , (5)
with probability 1 (or w.p. 1). It should be noted that for (sub or super) martingales, it is necessary to have an ordering notion to describe the past, present and future. Moreover, for sub and super martingales the range should have an order relation as well. Thus for multivariate Xt one needs to have an order in the range (or value or state) space since otherwise only the martingales (but not the sub and super concepts) can be defined. Consequently, for most of the following, only the real valued processes will be considered so that sub and super martingales can be included. Also it should be noted that in general E(X,[Y,) = E(Xn]X1,... ,Xn_I) is a function of the conditioned variables so that it equals g(Xi,...,Xn-1) for a Borel function g, and the (sub)martingale hypothesis restricts this dependence to the given immediate past. This is the crucial part of the hypothesis of the abstract notion. Hereafter ESn(.) and E(.IY~) are synonymous. After a brief discussion of the discrete parameter processes, most of what follows is directed towards the continuous case, since one can often "embed" the discrete in the continuous parameter theory by the following device. If {X,,@n,n > 0} is an adapted sequence, let ~-t = ~ and Xt =X~ for N n < t < n + 1, n _> 0. Then Y t - = a(Xs, s < t) = @n-1 for all n, {Xt, ~-~t, t > 0} is an adapted process, with moreover, ~ t + = 0s>t ~&s = ~&t and ~ t - = a(Us<t ~ ) Such a "right continuous" sub family { ~ t , t _> 0} of o--algebras contained in S, is usually called a standard filtration so that ~ t C Yt' for t < t r, J t = ~t+. For technical convenience, one augments the family to have all P-null sets for each t. This general concept will be of importance in applications later. Let us start with an elementary but basic characterization of a sub martingale in the discrete case, due to Doob (1953). It has far reaching implications in the subject and also shows why the continuous parameter case presents a really new challenge. (All the following statements should be taken to be true a.e. (or with probability 1) unless the contrary is stated.) 2. PROPOSITION. Let { X n , ~ n , n >_ 1} be an adapted integrable sequence on a probability space (f2, , P). Then it can be uniquely decomposed as."
X~=X~+A~,
n_>l ,
(6)
769
where { X ~ , ~ , , n >_ 1} is a martingale and { A n , ~ n - l , n >_ l} is an adapted integrable sequence with AI = O. Moreover, the given process is a sub martingale iff An <_An+l,n >_ 1, Proof. The following argument is simple and constructive. Thus for the representation (4), define A1 = 0 and recursively
A . = E ~" ' ( & ) - X . - 1 + A , _ , , n > 1 . (7)
Then writing, Yn = X~ - A,, one gets since clearly An is ~-n_l-adapted, E g" (Yn+l) =
E g"
(Xn+,)
A,+I by (7).
= E~ (Xn+l - A.+I)
=32,-A,
= Y,(=X~),
Hence { Yn,,Y , , n >_ 1} is a martingale so that (6) holds. If Xn = X,'~+ An = X~' + A1n are two such representations, then X~ - X~' = A', - A n . The right side is ~ , _ l - a d a p t e d , and the left side is a martingale. Hence {A', - An, g , , n > 1} must be a martingale so that by the defining relation (4) one has on iteration that E ~.-~ (A,', - A,) = A] - A~ = 0 a.e., and since each An,A1, is J , _ l - a d a p t e d it follows that A, = A;,, n > 1. But then X" - X " a.e., so that the representation is unique. In case {Xn, ~ , , n >_ 1} is a sub martingale, then E go (Xn+l) >_ 32, a.e. and hence by definition A1 >_ 0, A: > A1,..., implying 0 < An f t . Conversely, if 0 _< An J , then by the representation (6) one has
_
E~'(X,+I) ---X ~,+An+l >_X ~.+An = X , .

Consequently the sequence is a sub martingale. []
It is to be noted that in this construction A, is ~-,_l-adapted so that at the nth stage it is (based on the past) essentially known, or predictable. Also substituting from (7) for n = 1, 2 , . . . , one finds
n
A n = ~E~n-l(Xk
k=l
--Xk_l) ,
n > 1 ,
(8)
which is a sum of certain random variables. If the given process is of continuous parameter, these two facts do not easily generalize, and the corresponding problem remained open for nearly a decade when it was finally resolved by Meyer (1962-63). Subsequently alternative methods were found to simplify the work, and the result, termed the Doob-Meyer decomposition, will be given later. First some inequalities of importance for applications are established. Recall that the conditional expectation E 2 (X) of an integrable random variable X for each a-algebra N C Z of (f2, X,P), satisfies the identity E ( X ) = E(E~ (X) ) which follows from the defining equation fA Y d P = fA E~ (X) dP,A E by taking A = f2. Also if q0 : ~ -+ R is a convex function and E(~p(X)) exists, then for any a-algebra N c X, one has the conditional Jensen inequality:
770
E~(~o(X)) > ~o(E~(X)) ,
M. M. Rao
(9)
with a strict inequality (on a set of positive probability) if q~ is strictly convex and X is not N-measurable. This result can be verified from a classical characterization of a (continuous) convex function as the upper envelop of a countable collection of affine lines f~ : x ~-+ a11x + b,, x E ~. Thus (p(x) > f~(x), n _> l. Consequently
(p(X) > fn(X) = an)( + b~ .
(I0)
Applying E ~, which preserves inequalities, to both sides and taking supremum on the right of (10), one gets
E~(qo(X)) > sup{anEW(X) + b11}
11
= s u p f n ( E ~ ( X ) ) = (p(E~(X)), a.e.
tl
which is (9). The strict inequality statement follows from the fact that in (10) the right side line touches ~0 at most at a single point if it is strictly convex (but i f X is N-measurable there is always equality in (9)). An interesting consequence of conditional Jensen's inequality is given by the following result which links martingales and sub martingales by convex transforms. 3. PROPOSITION. Let {X~,Y11,n >_ 1} be a (sub)martingale and ~o : ~ --+ ~+ be a convex (increasing) function. Then {q0(X,), ~11, n > 1} is always a sub martingale whenever E((o(X11) ) < oc, n >_ 1. In particular, { X +, Y n , n > 1 } is a sub martingale.
Proof. Noting that {(p(X11), @11,n > 1} is an adapted integrable sequence, one has from the fact that Egn(X11+l) = (_>)Am a.e., E~"(qo(X,+l)) > (o(E~"(X,+I)), a.e. ,
by the conditional Jensen inequality (9), (and q0(-) is increasing). Since ~0(x) = max(x, 0) = x + is increasing and convex, the last statement follows at once from the preceding one. [] An interesting application of (9), and this proposition, is the following curious result. As in (3) one writes E(X] Y) for E ~(y)(X) for convenience. 4. PROPOSITION. I f X , Y are integrable random variables such that E ( X IY) = Y and E ( Y I X ) = X a.e., then X = Y a.e.
Proof. If X, Y have two moments then the result is very easy. Indeed using the basic identity (recalled prior to (9)):
E ( x - y)2 = EIE(Ix- Y]2Px)]

-- E [ ~ ( x 2 1 x ) ] - 2 ~ [ E ( x z ~ l x ) ] + ~ I E ( y 2 J x ) ] = e[x 2 _ 2x2 + e(r21x)] = e ( r ~) - e ( x 2) .
771
Interchanging X and Y and remembering the symmetry in the hypothesis one gets E(X - y)2 = 0 so that X = Y a.e. However, if X, Y have only one moment, then the above argument fails. An alternative proof, involving the conditional Jensen inequality, is as follows. First note that if (p(x) = ]xI, then q~l(x) -- f~xl(fl _ e_t)dt, fi _> 2 is strictly convex and satisfies q~(x) < qh(x)._< riO(x) for all real x so that E(q~l(Z)) < fiE(IZ]) < co, where Z = X, or = Y. Hence the inequality (9) implies
E(qh(X)[Y) > ~ol(E(XIY)) = ~ol(Y), a.e.

Interchanging X , Y here one finds E(Ol(Y)IX)>(pl(X) a.e. But taking expectations on both sides, one gets E(cpl(X)) > E(~ol(Y)) > E(CPl(X)). Since (Pl is strictly convex, these two inequalities cannot hold, unless X = Y a.e. as asserted. [] REMARK. A direct (but somewhat slick) proof of this proposition is also in Doob (1953), p. 314. A useful consequence of the result is that a sequence and its reverse are martingales iff the sequence is just a single random variable repeating itself explaining an inherent restriction involved in the martingale concept. Next some maximal inequalities of considerable importance will be presented. These are mostly from Doob (1953).
5. THEOREM.
(a) Let {Xk, Wk, 1 < k < n} be a sub martingale. Then for each
2 E , one has 2P[ maxXk > 2]

1<k<n
<-fireaxl<_k<nXI~>2] XndP <_E(IXnI) ,
(11)
and
(12)
1<k<n
inl<k<~Xk>2]
(b) I f in the above each Xk E LP(P), 1 <_p < oc, then

E ( m a x Ixk[P < \l<k<n -~@1{1+E(IXn[log +IX~L)}, p > 1, q - p-l, p=l . (13)
Proof. (a) The result (11) when the Xk are squares of a partial sum sequence of
independent centered random variables with finite variances is the celebrated Kolmogorov inequality, and the classical proof extends to the present case. The basic idea is to express the 'max'('min') as a disjoint union of sets on each of which the integral can be estimated. The details are as follows. LetAt = [X~ > 21, and for k > 1,Ak = IX,- < 2, 1 2] so thatAk is the set on w h i c h & exceeds 2 for the first time. ThusA = [-J~=l Ak = [maxk<n Xk], and each Xk is defined by the Xi, i <_ k whence Ak E ~ k and the Ak are disjoint. Consequently
772
M. M. Rao
P= _> X~dP Xk dP, by the sub martingale condition,
k=l
This is (11). The proof of (12) is similar. (b) If X * = maxk<~ Ix l, then since {IXklP,Yk,k 2 l} is a positive sub martingale one has by (11) for p > 1,2 > 0 > < IX~Ip dP
m
(14)
Now consider the distribution F of X*, i.e., F(B) subset of R, and use the image law to get:
= P o (X*) -I (B) for B a Borel
~(X*)PdP = ~+xP dF(x)

P x]
=pX.dPfoX*Xp-2dx
-- P f X,,(X*)P-~ dP, ifp > 1 p - 1 ./~ <- qll&llpH(x*)p-lllq, by HSlder's inequality, = qPlX~rlp(llx lip)q, q > 1 .
Excluding the (true and) trivial case that I f p = 1, (15) may be expressed as
, P-
(16)
I/x'lip
= 0 (16) implies (13) i f p > 1.
fX,>ljX* dP <- ~+ ~dx ~,>x_>l] X~ dP

= L X ~ log + X* dP . Since a l o g b _ < a l o g + a + l o g ~ < a l o g + a + ~ attains the maximum for a = ~[, (17) becomes X*dP- 1 < *>lj (18) for a_>0, b > 0 (17) [since log
<- ~ Xnlg+X~dP+leJef X*dP "
773
Hence (18) gives the second part of (13), as desired.
[]
The above inequalities are used in mathematical analysis and in many applications. This will now be indicated by an interesting result. Let X = {X,, ~ , n > 1 } be a square integrable martingale, so that E(X 2) < oc for all n. Then Y~ = X~ -Xn-1, (X0 = 0), n _> 1, is termed a martingale difference (and increment in the continuous parameter case) sequence. Observe that { Yn, n _> 1} is an orthogonal set since
E(Yn~,+I) = E(E'~"(YnYn+I))
= ~(Y.E (x.+l - x . ) ) = E(Y,(X, - X , ) ) = 0, by the martingale property .

n Y; 2+ Now let s,(X) = (~k=l )-, and let s(X) = lim, s,(X) which exists since s,(X) is monotone. The s,(X) corresponds to the classical "Luzin square function" of trigonometric series. We shall see in the next section that for a martingale satisfying sup, E(]X~l)< oc,X,---+ Xo~ a.e. holds. Assuming this result for the moment, and also the fact that s(X) < ~ a.e., one may define the following spaces of martingales for each given filtration {~-~,n > 1} from (f2, S,P). Namely, ~ 1 = {X: E(s(X)) < oo} and BMO = {X: sup, IIE~"(IX~ -x,_112)ll~ < ec). The set ~ 1 is the analog of the classical Hardy space with norm IlXlll = E(s(X)), and BMO is that of bounded mean oscillation space of considerable importance in mathematical analysis. The norm in the latter space is given by
112118 = sup IIEEgn(X~ - x . _ l )
2 1
]211~ .
It is seen that (j,f l , I1"II1) is a normed linear space, and one can verify that it is complete. On the other hand (BMO, I1' lie) is also a normed linear space (and complete which is much harder to establish), but there is a deep relation (duality) between these two spaces, which was an open problem for some years. Then Fefferman (1971) showed that BMO is the dual of -24 ~1. His work was for functions but admits an adaptation for martingales. This was discussed in detail in Garsia (1973), and a streamlined treatment in the context of the general theory of processes is given in Rao (1995, Section 4.5). A number of other interesting inequalities with extensions was obtained in Burkholder (1973). Since the convergence aspect of martingale theory is needed in this discussion, let us turn to it now.
3. Convergence theorems First we consider the special case of positive martingales, and later extend it to the general case by a (Jordan type) decomposition for the whole class of (sub, super, quasi or semi-)martingales to be discussed shortly. Thus the starting key result is: 1. THEOREM. Let {Xn, Yn, n >_ 1} be a nonnegative martingale. Then it converges a.e. In symbols, Xn --+ X~ a.e., and also E(X~) <_lira infn E(X,).
774
M. M. Rao
Proof Since t ~+ (p(t) = e -t is a convex function ((p'(t) > 0), by Proposition 2.3, Y~ = p(X~) = e-X",n > 1, is a bounded positive sub martingale for the same filtration { ~ , n > 1}. Since ~0(.) is one-to-one, X, ~Xo~ iff Y~ ---+ Y~ a.e., and the result follows if it is shown that each positive bounded sub martingale converges a.e. We show more generally that a positive Lz(P)-bounded sub martingale { Z n , Y , , n > 1}, (i.e., E(Z 2) < K < oc,n >_ 1) converges a.e. and in Lz(P)-mean. The mean convergence is easy and it is first proved and is then used in the pointwise conclusion. Indeed, it follows from definition of a sub martingale that the expectations are monotone nondecreasing. The same is true of the sub martingale {Z2, ~ , n > 1}. Thus E(Z 2) <_E(Z2+I) _< Ko < oo so that the sequence {E(Z~),n >_ 1} converges. Then 0 <_E(Z~) - E(Z~) = E(Z,, - Zm) 2 + 2E(Zm(Z, - Zm)), m <n , (19)
and E(Zm(Zn - Zm)) = E(ZmEg~(Zn - Zm)) >>0 by the sub martingale property. Since the left side of (19) goes to zero (by the convergence of the above bounded monotone sequence) and each of the right side terms is nonnegative, so each must tend to zero as n --+ oc. In particular E(Zn - Z m ) 2 --* O, so that {Z,,n > 1} is Cauchy and hence converges to a limit Zo~ E L2(p), by completeness of the latter space. It will now be shown that Zn -+ Zo~ pointwise a.e., and this is the hard part for which one needs Theorem 2.5(a). Here are the details. For an arbitrarily fixed integer m _> 1, the sequence {Zk - Zm, Y ~ , m < k < n} is evidently a sub martingale. So for each e > 0, one has by Theorem 2.5(a): P I max (Zk - Zm) >_ ~] <_ 1E(IZn - Zml)
1_m < k < n
(20)
and P I min ( Z k - Zm) _>

L m<k<n
-el < 1 [E(IZn - Z m ] ) -
E(Zm+a- Zm)]
(21)
Adding (20) and (21), letting n -~ oc, and using the L2(P)-convergence established above, it follows that
<_ 1 [2E(IZm _ Zo~ I) + E(tZm+~ - Zm J)l ---' 0
as m ---+ ~
(22)
since L2(P)-convergence implies L ~(P)-convergence. This means lim P [ s u p I Z k - Z n l > e] = 0 .

n--+oc l_k>n
(23)
775
If Z* = lira supn Z,, Z, = lim inf, Z,, then (23) implies

P[Z* - Z, >_ 2el <_ 21im P[sup ]Zk -- Zm[ >_ ~] = 0 .
n
Thus Z* = Z, so that Z, ~ Zoo(--- Z* = Z,) a.e. and hence X~ -+ Xoo a.e. This is the desired assertion. The last part is now a consequence of Fatou's lemma. [] The above argument also contains the following additional information, regarding sub (super) martingales. 2. COROLLARY. If {Xn, if'n, n ~_ 1} is a (not necessarily positive) sub martingale (or { - X ~ , Y , , n >_ 1} a super martingale) such that E(X~) <_Ko < oo, then X~ ---+Xoo a.e. and also in L2(P)-mean as n -+ oc. The assertion of Theorem 1 is immediately extended for a martingale {X~, g ~ , n > 1} satisfying the condition supn EIX, I) < oo. We deduce this from a well-known consequence (due to Nikod)m) of the classical Vitali (-Hahn-Saks) theorem of Real Analysis as follows. For the martingale {Xn, ~ n , n >_ 1} as above, let #~ :A H fAX~dP, A E ~ , so that I#~[(A)=-fA [X~IdP, the variation, is a a-additive function on ~-,. But {]Am[,~ , , n _> 1} is a sub martingale (cf., Proposition 2.3), so that I#~I(A) = f [X~IdP -< fA [Xn+l]dP = [#~+I[(A), A E @~
and [#,l(t2) = E(IX~I) <_ supnEC[Xn]) < oc by hypothesis and lim,_~oo I#,l C A) = vk(A) exists for all A E J~k. What is more v~(.) is a-additive by the Vitali-Niko@m consequence noted above [cf., e.g., Rao (1987), p. 181]. Further it is evident that vk = Vk+llY~ and vx << P. If Yk = dvk/dP, then the last equation implies that {Yk, ~ k , k >_ 1} is a positive martingale and hence converges a.e. by Theorem 1. Moreover,
fA YkdP = vk(A) >
I#kl(/) >-~(A)= f~ XNdP
for all A E ~ k , and the extreme integrands are ~,~k-measurable. This implies that Zk = Yk --Xk >_ 0 and that {Zk, ~-k, k _> 1} is a positive martingale being the difference of two martingales on the same filtration. Hence X, = Yn - Zn, n > 1 is a difference of two positive martingales and by Theorem 1 both Y~ -~ Yoo,Z~ --+ Zoo a.e. so that X, ---+ Y~ - Zoo = Xoo (say) a.e. as n -~ oc. Thus we have the general assertion (the last being a consequence of Fatou's lemma): 3. COROLLARY (Doob's martingale convergence theorem). I f { X n , ~ n , n >_ 1} is a martingale such that supnE(IXn])<oc, then Xn-+Xoo a.e., and E(IXoo]) <_ liminfnE(lXnl). Moreover (Jordan decomposition), an D-bounded martingale is expressible as a difference o f a pair o f positive martingales where all the three processes are based on the same filtration. The above result allows an improvement of Corollary 2 above:
776
M. M. Rao
4. COROLLARY. An Ll(P)-bounded sub (or super) martingale {X~,Wn,n > 1) (i.e., sup~E([X,I) < c~) converges a.e. to X~ and E(IXoo[) _< liminf~E(]X,[).
Proof. We consider the sub martingale case. By Proposition 2.2 the X,-process can be expressed as X,=X'~+A,, n> l ,
where {X~,Y,,n > 1} is a martingale and A1 = O,{An,~n-l,n >_2} is a nonnegative increasing process. Also 0 <_E(An) = E(X~) + E(X~) <<E(X() + sup~E(l&[) = K 0 < oc. Hence A ~ / A ~ a.e. and E(A~) <_Ko. Thus E(IX~I) <_ E(IX, r) +E(A~) <sup, E(lX, I) +K0 < oo. Therefore by Corollary 3, X~ --+X~ a.e., and then X~ --+ X~ + AM = X~ (say) a.e. The last inequality is again a consequence of Fatou's lemma. [] The improvement gained is on the integrability condition. Note that the trivial process X, = n a.e. shows that the L~(P) boundedness is essentially the best condition for the above result to get convergence with a.e. finite limits. The proof of Theorem 5(a), of Section 2, uses the idea of an event occurring for the "first time". This is an interesting special case of the concept of a stopping time which plays a fundamental role in Sequential Analysis, and is also of importance in martingale theory. If (f2, X, P) is the basic probability space, as in Section 2, and ~ , c ~-,+1 c 2; is a sequence of o--algebras, then a mapping T : ~2 ~ N U oc is called a stopping time of the filtration {~n,n >_ 1}, if the event IT = n]E ~ n , n _> 1. More generally if {~-t, t _> 0} is an increasing family of o--sub algebras of X such that @t = ~s>t ~ s , t > 0, so that it is a standard filtration, as defined in Section 2, each completed for the P-null sets for convenience, then T : (2 --+ ;~+ is a stopping time of such a filtration if {co : T(co) _< t} E Yt, t > 0. Thus T is a stopping time iff its values are determined by the past and present (i.e., < t) but not on the (unforeseen) future. Thinking a (sub) martingale {Xt, Yt, t > 0} as a fair (favorable) game, suppose the player skipped the game at certain random times T1 _< T2 < .... Then it is legitimate to ask whether the processes {Xr,, n _> 1} is again a (sub) martingale, i.e., whether the values Xr,, observed at times T,, constitute a new (sub) martingale, and if so do the convergence results hold. Conditions for affirmative solutions to these questions will be outlined now as they have considerable theoretical interest in the subject and are also of importance in applications. A simple example of stopping times is given by: TA inf{t _> 0 :Xt E A} with inf{0} = ~ . Thus TA is thefirst time that the process enters the set A. This was used in the proof of the above recalled theorem with A = (2, ec) c N. As is clear from this discussion, there will be new technical problems in employing this powerful stopping time tool. In fact, for each stopping time T~ of the family {@t,t > 0} the symbol Xr,, is the composition of the functions X~-{Xt, Yt, t> 0} and T, (=Xr,,(co)=XT,,(o)(CO) or simply X(T,(co),co)) and hence is a random variable for each n. However it is adapted to a new family (not necessarily Yt, t _> 0). The new class is defined as: f f ( T , ) = {An [T~ _< t] c Yt, V t _> 0}, and is called the family of events "prior" to T~. One verifies that it is
=
Martingales' and some applications
777
a o--algebra and for each n, T~ <_ T,+I ~ ~-(Tn)C ~(T~+1) and T~ is ~ ( T , ) adapted. The following basic assertions on a calculus of such real stopping times will be needed in applications below. There are fewer mathematical technicalities if each stopping time T,, takes values in { 1 , 2 , . . . , oc} rather than in E+, but both cases occur in the problems usually studied. 5. PROPOSITION. Let { ~ t , t >_ 0} be a standard filtration from a probability space ((2, Z,P) and {Tk,k >_ 1} be a collection of stopping times of the filtration. Then inf(Tkl, Tk2), sup(Tkl, Tk2),Tkl + Tk2, liminfk Tk, lim supk Tk, (but not necessarily Tk~--Tk2 or c~Tk for O< c~ < 1) are stopping times. I f Tk >_Tk+l then and T = limk Tk is a stopping time satisfying J ( T k ) > g(Tk+l) g ( T ) = N7_1 In many practical examples, it is usually easy to verify that the composition of random functions with relevant stopping times are again random variables, especially with discrete values, although in some cases (e.g., for continuous parameter processes) this can be an annoying technical problem. [For a discussion of these questions, see Dellacherie (1972), and Dellacherie-Meyer (1980), or for a quick account of the results that are used below, one may also refer to Rao (1995), Chapter IV.] Recall that a process (or random function) t ~ Xt(co) is left [right] continuous at to, if limsTt0Xs(CO)[limpet0X~(co)] exists for almost all 09 E f2. It is verified quickly, using Corollary 3, that a (sub) martingale {Xt, ~-t, t >_ 0} relative to a standard filtration has left and right limits at each t > 0, for almost all co, and these limits can be different for at most a countable set of time points. The discontinuity set of points {tn, n _> 1} can be either jumps or they may be "fixed" or "moving". [A point to is a fixed discontinuity of the process if PIe) : lima+t0 X,(w) = Xt0(co)] < 1 and a discontinuity which is not fixed is called a moving one, the latter are thus not (jump or) of first kind, but are called 'second kind' in real analysis.] We now have the following form of the (sub) martingale under optional sampling (or skipping) times. 6. THEOREM (Doob). Let { X t , ~ , t >_ 0} be a (sub) martingale relative to a standard filtration. Let Tn < Tn+l,n = 1 , 2 , . . . , be a sequence of stopping times of the filtration. Then the process {XTo, @(T~), n > 1} is again a (sub) martingale whenever there is an integrable random variable Y >_ 0 such that IXt] < Y a.e., t > O. More generally, the conclusion holds if {Xt, t >_ 0} is uniformly integrable, i.e., l i m k ~ E ( Z i i x , l>k]lXtl)=O uniformly in t. This is always satisfied if suptE(IXtl p) < K < oc for some 1 < p < oc (by the Hflder inequality). We shall use this result in our applications below. It can be proved using the preceding properties and Corollaries 3 and 4, but involving many more details. The complete proofs and improvements can be found in the references given preceding the statement of the theorem. Before proceeding further, it will be instructive to present an application of this result to establish the fundamental identity of Sequential Analysis due to Wald. There are a number of proofs, but we include one based on martingale theory.
778
M. M. Rao
The problem is as follows: Let X 1 , X 2 , . . . , be a sequence of independent random n variables with a common distribution having mean c~. If S. = }-~k=l Xk, one observes the sequence as long as a _< S, _ b for a given pair of reals a l : S n < a , or S~ > b} = inf{n _> 1 :S~ E N - [a,b]}. Then T is a stopping time of the filtration {Yn,n > 1},~-n = o'(X~,, 1 < k < n). Suppose that E(T) < oc. Then the Wald identity states that E(ST) = teE(T). Indeed, consider Y k = X k - - e and S', = ~ = 1 Yk. Then E(IS~I) < E(IS~[) + nl~ I < 00, and since the Yk are independent with means zero, {S~, ~',, n _> 1} is a martingale. If 7"1 = 1, T2 = T _> 1, then T1,T2 are stopping times of the filtration {~-,,n_> 1} and ff(T~) = ~-1 c if2 = i f ( T ) . Since a finite set of integrable random variables is trivially uniformly integrable and E ( T ) < ec by hypothesis, it follows that E(lSril) < oc and (by Theorem 6) that {S'r~, ~(T/)}~ is a two element martingale. But a martingale has constant expectations. Hence E(S~2 ) = E(S~I ) = E ( S 1 - c~) = E(X1) - c = 0. However S~, 2 = (St - Tc) so that 0 = E(S~r) = E(Sr) - 7E(T), as desired. There is also a beautiful proof of this identity due to Blackwell (1946) based on Kolmogorov's strong law of large numbers. His proof admits an extension to certain uncorrelated (but not necessarily independent) random variables as follows. 7. COROLLARY. Let X 1 , X 2 , . . . , be a sequence oJ" uncorrelated random variables with a common mean and uniformly bounded variances. Suppose that the sequence of observations is stopped at a (random) time T based on the past and present such that either ST < a or ST > b where S, = ~ = 1 Xn as before. I f E(T 2) < cx), then E(Sr) = E(X1)E(T) holds. The proof is an extension of Blackwell's method, but now it uses Rajchman's (instead of Kolmogorov's) form of the strong law of large numbers. (For details see Rao (2000), III.6.6, p. 129.) It should be observed that {Sin,~,~n, n > 1} need no longer be a martingale in this case; and this is partly compensated by the stronger assumption of the existence of second moments of the Xn as well as the stopping time and of their uniform boundedness.
4. Elements of (semi)martingale calculus

Although most of the limit theorems appearing in applications use only the discrete parameter martingale convergence theory and the fundamental inequalities of the preceding sections, for some important parts, such as the finance mathematics to be detailed in a later section, the work depends crucially on the corresponding continuous parameter processes and their extensions. To facilitate this account, some (differential and integral) calculus results in a sufficiently general form, to include both the sub and super martingale cases, will be discussed in this section and they play a key role in the rest of the following analysis.
779
The starting point is a search for a continuous parameter analog of the last half of Proposition 2.2 on the Doob decomposition of a sub martingale. The difficulty starts already in finding a suitable substitute for the integrable increasing process {Am,~ - l , n _> 1} when the discrete index n is replaced by a continuous one, denoted by t. The appropriate concept and the resulting decomposition were discovered by Meyer (1962) almost a decade after the problem was raised by Doob. It took another decade to find simpler proofs of the result, one of which by extending Doob's original method for the discrete case using a weak compactness argument of abstract analysis, thereby (justifiably) calling the end result the Doob-Meyer decomposition theorem. The statement will now be presented after introducing the relevant new concepts. It is then possible to give a unified and general versions of these propositions to use in integration and subsequent applications. The fact that Am is ~n_l-adapted may be understood as being "predictable" from the knowledge of the past, i.e., that denoted by Yn-1. In the continuous parameter case this has to be made precise and it is as follows. 1. DEFINITION. Let {@t, t _> 0} be a standard filtration from the basic probability space (O, Z,P), i.e., ,~t = J r + = A~>t g~, t _> 0 and (for convenience) all P-null sets be included in each ~-t. Then the a-algebra ~ generated by the sets {(s, t] x F : s <_ t,F E J , } tA {{0} F , F E ~0} is called a predictable a-algebra. An adapted process X = {Xt, @t, t >_ 0} is termed predictable if, regarded as a function X : ~+ x ~2 ---+ R, it is measurable relative to ~ c N(R +) Z (and if it is just measurable relative to ~ ( R +) , it is simply termed a measurable process). Although this definition looks somewhat involved on the surface, it can be verified that ~ is the same as the a-algebra generated by all (or even only left) continuous adapted processes {Art,Yt, t _> 0}. This a-algebra ~ plays a key role in both the theory and applications. For instance, it can be shown that if Y denotes the set of all stopping times of the filtration { ~ t , t > 0}, then ~ is also generated by all the "stochastic intervals" ~0, T1 which are sets of the form t0, T] = {(t, co) : 0 < t < T(co)} for all T C J-. This connection between stopping times and predictable a-algebras is useful. Many results on stopping times and the corresponding a-algebras, as well as the related classifications are discussed in the literature in detail. See, e.g., Dellacherie (1972), Dellacherie and Meyer (1980), M6tivier (1982), Rao (1995), and others where proofs of the above statements together with a rather detailed treatment regarding their calculus as well as the classification may be found. The next concept is an essential ingredient of the desired decomposition. 2. DEVlN~TION. Let {,~t,t >_ 0} be a standard filtration from (f2, Z,P) and {At, ~ t , t > 0} be an integrable right continuous increasing process with A0 = 0, a.e., and suPtE(At ) < oo. Then it is called a predictable increasing process, if for each right continuous positive bounded martingale { X t , ~ t , t > 0} (so limt_,~ Xt = Xo~ exists a.e., and limnXt__l = Xt exists a.e.)
780
M. M. Rao
E ( f ~ + X , - dAc) = E(XooA~) ,
where the integral relative to At is a pointwise Stieltjes integral.
(24)
Since some of the limit operations here and later involve continuous parameters (and hence are more than countable) there will be technical problems of a measure theoretical nature. However, the hypothesis of right (or left) continuity of the process allows one to invoke "separability" of the families so that effectively countable operations accomplish the desired task. This point will not be pointed out at every turn when such appears. The legality will be implicitly used. Here Eq. (24) incorporates the condition that At is o~t measurable and this is a technical requirement which is the continuous parameter version of that given in Eq. (8) in Section 2. This is shown in a routine fashion that {At, ~-t, t _> 0} is measurable relative to the predictable o--algebra ~ determined by the standard filtration { ~ t , t _> 0}, given above. Roughly stated, this is equivalent to the requirement that for any bounded random variable X, if ~ = a(Ut> 0 3t),Xo~ = E ~ (X) and ~ t = a(U0<,<t ~s), then Xt_ = E ~'- (X) defines a left continuous bounded martingale and moreover one has,
=E
(/0
X dE s,- (At) , (this is legitimate),
= e(e
= e(X
) .
Also a process {Xt, @t, t E +} is said to be of class (D) if for each collection of stopping times {Tj,j ~ J} of the filtration {~-t, t E N+}, the set {X o Tj.,j E J} is uniformly integrable. If this condition holds for each compact interval of N+ it is locally of class (D), or termed of class (DL). This technical condition was introduced by Doob motivated by the solutions of Dirichlet's problem in potential theory. With these concepts at hand, it is possible to present the continuous parameter version of Proposition 2.2, for super martingales, due to Meyer (1962-63): 3. TnEO~,EM. Let {Xt, ~ t , t >_ 0} be a right continuous super martingale of class (DL) , relative to a filtration satisfying the standard conditions. Then there exists an increasing integrable predictable process {At, ~ t , t > 0},A0 = 0 and a (right continuous) martingale { Yt, Y t , t >_ 0} such that Xt=Yt-At, t_>0 , (25)
and the decomposition is unique.

Several proofs (and extensions) of this result exist, but none is simple enough to present here. So it may be referred to any one of the references given above.
781
In general for a (sub or super) martingale X = {Xt, Yt, t >_0} which satisfies suptE(IXtl) < oc (i.e., L 1(P)-bounded), one has the following assertion: For each rc : 0 _< to < tl < .-. < t, < oc, since Eg~o-1(Xt,,) = (>_, _<)J(t, 1, it is true that
n )
~= E(E~'-I (X,~) - ~ 1
< supEIX, I = gx < oo ,

t
for any partition ~, The left side may also be written (since for martingales the quantity inside is zero, and for sub or super martingales it is either non negative or non positive) as:
supZE([Eg'~-, (Xt~-Xt~_,)[) <_ Kx < oc .

rc i=1
(26)
Any adapted process X for which (26) holds is called a quasirnartingale, a concept originally introduced and analyzed by Fisk (1965) for continuous processes, and generalized further by Orey (1967) for right continuous ones. Thus this class includes all the Ll(p)-bounded (sub and super) martingales, but is larger. In fact it is evident that the class of quasimartingales on the same filtration {Yt, t _> 0} is a vector space so that linear combinations of (L 1(P)-bounded) sub or super martingales are quasimartingales but not necessarily sub or super martingales. Now if {Xt = }-~4~=1aiX/, Yt, t > 0} is such a combination of class (DL) right continuous integrable processes with the same standard filtration, then using Theorem 3, one finds immediately that there is a decomposition Xt = Y t - At where Yt = ~ i n _ l aiYti, At = 2in_l aiAi with Xj = Y / - A I , so that {Yt,~t,t >~0} is a right continuous martingale and At (as a linear combination of increasing integrable predictable processes), is an integrable, predictable process of bounded variation. Hence it may be expressed as a difference of suitable increasing processes. The interesting fact is that the converse of this statement is true, and thus an intrinsic characterization of quasimartingales of considerable interest, especially for stochastic integration, can be given. The appearance of a process of bounded variation along with a martingale component indicates that this study must be related to some standard results in real analysis. This nontrivial fact is the content of the following theorem due to Dol6ans-Dade and Meyer (1970). The idea here is to associate a (real) set function on the predictable a-algebra determined by the given standard filtration, find a suitable (additional) condition for its a-additivity, and then analyze the structure of the process. [There are alternative procedures, but this is somewhat shorter and reveals the nature of the problem better.] Thus let ~ be the predictable a-algebra of (R + x ~, ~ ( R +) Z) introduced earlier for the filtration { ~ t , t > 0}, and let ~a be the corresponding class when R + is replaced by [0, a]. Then one can identify for a < a ~, ~a C ~a' C ~. For an integrable process X = {Xt, ~ t , t > 0} define
#~((O, a l x A ) = [ ( X s - X t ) d P , JA
AE~s,
O<s<t<a
(27)
782
M. M. Rao
(with the above = 0 if s = a). It is not hard to verify t h a t / ~ : ~a -+ N is finitely additive. The desired signed measure representation for our process is given by: 4. THEOREM. Let X = {Xt, ~ t , t > 0} be a right continuous quasimartingale with {Yt, t _> 0} as a standard filtration of(O, ~, P). Let #~ be the set function associated with the process X on [0, a] by (27). Then there exists a unique signed measure #x : .~ ___+~ such that l~]~a = #x for a >_ O, iff the process Y is of class (DL). For a proof of this useful result, which is not simple, one may refer to Dellacherie-Meyer [(1980), Chapter VII] or Rao [(1995), pp. 365-369]. An interesting consequence of this result will be detailed in the following (a special case of which was already discussed prior to Corollary 2.3): 5. THEOREM (Generalized Jordan Decomposition). Let X ~ {Xt, ~ t , t >_ 0} be as above, i.e., a right continuous quasimartingale of class (DL). Then it is the difference of two positive super maringales { X t i , ~ t , t > 0 } , i = 1,2;
Xt = x t l - ~t2, t ~ O.
Every right continuous martingale X is of class (DL) so that Xt = Xt1 - X t 2 holds by theorem, with {X/, ~ t , t _> 0}, i = 1,2 being positive super martingales. But -Xt = (-Xt 1) - (-Xt 2) gives (-X/)-processes to be also super martingales, since - X is again a right continuous, class (DL), martingale. Hence the Xtiprocesses must be positive martingales. Consequently one has the following Jordan type decomposition for continuous parameter martingales. 6. COROLLARY. Every right continuous Ll(P)-bounded martingale admits a decomposition Xt = Xt I - X 2 where the {Xti, ~ t , t > 0}, i = 1,2 are positive right continuous martingales. (Hence Corollary 2.3 holds for continuous parameters as well.) We now turn to a sketch of the PROOF OF THEOREM 5. Let #x : ~ __+ N be the signed measure associated with the given quasimartingale, as assured by the preceding result. Then by the classical Jordan decomposition, #x = (#x)+ _ (#~)-. This is in general not unique, but can be made unique by demanding that the positive measures (#~)-~ be mutually singular (i.e., they have essentially disjoint supports). Since #~ is bounded (being a signed measure), the (#~) are also bounded. Moreover #~(A)= #x((t, oc) x A), A c ~,~t, is P-continuous and so the same is true of the positive components, Let Xt = d(#~)/dP, by the R a d o n Nikod)m theorem. It then follows that {Xt, Yt, t _> 0} are positive super martingales and one has Xt = X + -Xt- , t _> 0, giving the desired decomposition. It can also be shown that, in this decomposition, one can take X[%processes to be right continuous and of class (DL), but this detail will be omitted here. [] The interest of this result is seen from a comparison of it with the Doob-Meyer decomposition given in Theorem 3. Indeed one has the following important consequence.
783
7. COROLLARY. Let X = {Xt, Y t , t >_ 0} be a right continuous quasimartingale of class (DL). Then it admits a unique decomposition as: Xt=Yt+Vt, t_>0 , (28)
where {Yt, Yt, >_ 0} is a right continuous martingale and {Vt, ~ t , t >_ 0} is a right continuous predictable process of bounded variation on each compact t-interval, in the sense that Vt is the difference of two increasing predictable right continuous processes for the same filtration { ~ t , t > 0}. This result implies that it is only necessary to define a stochastic integral relative to a martingale, in order to consider integrators with quasimartingales in applications, since the latter are typically of unbounded variation and the Stieltjes definition of integral does not apply. Thus the difficulty is relegated to the martingale component in such a study. For a unified treatment of stochastic calculus, we also introduce the following general concept. 8. DEFINITION. A right continuous process {Xt, ~ t , t >_ 0} on (9, X,P) relative to a standard filtration {~-t, t >_ 0} from S, is called a semimartingale ifXt = Yt + Zt where {Yt, ~ t , t > 0} is a martingale and {Zt, ~ t , t > 0} is a process which is the difference of two predictable increasing processes on the same filtration. If {Yt, ~ t , t > 0} is a local martingale, then the given X~-process is called a local semimartingale. [We recall that a right continuous process {Yt, ~ t , t > 0} is called a local martingale if there is an increasing sequence of stopping times {Tn, n > 1} of the filtration such that (i) P[Tn _< n] = 1, (ii) P[limn Tn = oo] = 1, and (iii) ifz~' = Tn A t, and Y~ = Y~7then {I?~, ~-t, t >_ 0} is a uniformly integrable martingale for each n.] The local martingale concept is technical and unmotivated. It may be verified that each positive local martingale is a (positive) super martingale. Thus the local concept plays an important technical role in the Doob-Meyer decomposition as well as in the general integration theory, and it roughly occupies the position of local compactness in the work on topological measure theory. The relation between quasimartingales and semimartingales is very close, as one may expect. The precise statement is given by: 9. THEOREM. Let X = {Xt, ~ t , t > 0) be a right continuous process on a probability space (Q, Z,P) such that suptE(lXtl) < oc, i.e., is L 1(P)-bounded. Then X is a semimartingale iff it is a quasimartingale of class (DL). In general, a quasimartingale is a local semimartingale. The proof depends on the general theory of (continuous parameter) martingales, and may be found in any of the standard works on the subject cited above [e.g., cf., Rao (1995), p. 271]. The reason for discussing these concepts here is to introduce stochastic integrals, and present some of their key properties to use in the following applications, starting in the next section. As indicated already, the
784
M. M. Rao
semimartingale concept is useful for concisely stating the general results, and quasimartingate is the 'work horse' of the subject to lean on. For the purpose of quick advance into the integration theory, it is expedient to introduce a general boundedness principle, originally due to Bochner (1954), as it unifies the It6 integral and the Wiener-Kolmogorov-Cram6r-Sratonovich as well as virtually all the other stochastic integrals studied so far in the literature. It can be specialized to various other definitions and obtain their properties for a detailed study. Here is the basic principle:
LP'P-bounded, if there
10. DEFINITION. A processX = {Xt, t _> 0}, Xt C LP(P),p > 1,p > 0, is said to be is an absolute constant C = (Cp,p > 0) such that for each Borel simple function f : ~+ -+ R, one has
E( j[+f(t)dXtP) < C j[+,f(s),ds ,

where for f
(29)
= ~i~_1 aiz[t,,t,+~)one has the (clearly unambiguously defined) symbol fu+f(t)dXt = ~;Ll ai(Xt,+,-Xt,). If p = p = 2 in (29), then X is termed L 2,2bounded, the simplest case but is often used as a first key step.
This definition immediately applies to the Wiener or Brownian Motion (BM) process. Indeed, recall that a BM is a process {Xt, t _> 0} of independent increments such that for each s < t, Xt - X s is normal with mean zero and variance cr2(t - s), written as N(0, o-2lt - sl). The existence of such a process was first established by Wiener in 1923, and several simpler proofs of the result are now available. Two such proofs may be found in McKean (1969). The process has continuous sample paths, i.e., t ~ Xt(co) is continuous for a.a. (co). Then the left side of (29) becomes with p = p = 2:
t
+
=E
ai
,+1 - ~ t ,
= ~" a2iE(Xt,+~ X,,)2

_
i=l
+2
n
~
1<i<j<n
aiajU(Xt,+l-Xt,)(Xy+,-Xg)
_ t,) + o,
=
i--1
because of independent increments,
~- a a .~+ If(s)I 2 ds
(30)
and (29) is valid with C = 2 > 0 and even an equality. Thus the BM is L2,2bounded. It may be shown with a more detailed analysis that all stable processes are LO,P-bounded for some p > 0 and p but both not 2. [The necessary argument is in Bochner (1954).] Now if the Xt-process is of orthogonal (but not independent) increments, as it appears in the representation of (second order) weakly stationary
785
processes, then the above definition does not directly apply. However, stochastic integrals already exist for such integrators in the literature, due to Kolmogorov, Cram~r and others. The It6 integral is a generalization of the one with the BM in that the integrand f in (29) is also a stochastic function. To include all these cases and most martingale integrals generally, Definition 10 has to be extended and the following concept fulfills the desired search as shown below: 11. DEFINITION. Let q)i : ~ ---> N+ be an increasing, symmetric function such that q~i(x) = 0 iffx = 0, i = 1,2, and X = {Xt, t c I c R} be a right continuous (or just X(.,-) : I x O -+ R is measurable relative to ~ ( I ) Z) process on (f2, S,P). Let (9 C N([) S be a ~-algebra, and e a a-finite measure on (P. Then X is called Lel,~2-bounded relative to (9 and c~ if there exists an absolute constant K ( = K~I,~2 > 0) such that
E(~o2(zf) ) <_K ~
JI xf2
~Ol0C)d~ ,
(31)
where ~ : f H f~fdXt is a mapping defined exactly as in (29) for each simple function f(t, og) = ~i~_1 fti(og)Z[t~,t,+,)(t), with tl < t2 < ... < t~+l,ti E I, and fie" Z[ti,t~+~)an (9-measurable bounded function. Note that if q h ( x ) = IxlP,~o2(x)= IxlP,~ = ~ p where # is the Lebesgue measure, ft,(co) E R, and (9 = N(I) {(b, f2}, then L~,~2-boundedness becomes LO,P-boundedness. In this particular case e and (9 are usually not mentioned, as they are regarded familiar objects. The definition above may appear too general, but the following examples show that it admits specializations and includes all the cases that are currently considered in the literature. The essential point of either of the above definitions is that ~ : LP(~) ~ Lp(p) is a linear mapping defined on all simple functions, and by (29) or (31), it is bounded. Consequently ~ has a unique bound preserving (linear) extension onto J~, the closure of the simple functions of the (linear) metric space LP (~) (or in the general case L ej (c~)), which is L~(c~) itself (but in the general case it could be a proper subspace ofL ~ (c~)however). The thus obtained extended mapping, denoted by the same symbol, z(f) = ft f(t)dXt, is the desired stochastic integral. After the following generic examples, the general statement of the theorem will be given. EXAMPLZ a. (The K o l m o g o r o ~ C r a m 6 r integral) This is defined for integral representations of stationary processes. Thus let {Zt, t E R} be an orthogonally valued process, so that Zt EL2(P),E(Zt)=0 (for simplicity), and for tl < t2 <~ t3 <t4, tiE[, one has E[(Zta-Ztl)(Ztz-Zt3)]=O, o r (Zt2 -Zt~) 5_ (Zt4 -Z~3). Thus (30) becomes, in this case, with the same notation and a simple computation: E(l~fl ~) = ~
~=1
la~IaE(IZ~,+, - L,I e)
aiajE[(Z,,+~ - Zt,)(Zt,+~ -
+ 2 ~
l<i<j<n
Z,~)]
786
M. M. Rao
=~
i=1
lail2E[tZ,ilI2 - IZtil 2] + O,
since i + 1 _<j, and the increments are orthogonal,
=~
i=1
lail2C~(ti+') - ]A([i))
where #(t) = E(I/,I a) </4t'), t <_ t' in I and we take I to have a least element a0 and Zao = 0 for computational facility, since otherwise Zt - Za0 will do. Thus # defines a (bounded) Borel measure, denoted by the same symbol, and one gets E(lzfl2) = .fI If(t) Is d#(t) , (32)
and here d/z(t) is not necessarily the Lebesgue measure. But (31) implies, by (32), that for all f E L 2 ( # ) , ~ f = f l f d Z t is a well defined stochastic integral and {Zt, t E I} is L2,2-bounded. Taking f = ZA,,,An ~ O, this shows that Z(.) defines a vector measure, i.e., a o--additive function on X into the vector space of random variables L2(p). A n interesting and useful consequence of this fact applied to the integral given by the above example is the following: Let T : L2(p) ~ L2(p) be any bounded linear operator. Then one has T ( r f ) = T f1 f(t)dZt = fl f ( t ) d ( T o Zt). The last equation is a consequence of a classical result due to E. Hille, which says that a bounded linear operator and a vector integral, as here, commute. Now if Z(.) = T o Z(.) then T ( r f ) = fz f(t)d2t, and hence
= [JT(~f)II~ _< K211zfl122
= K 2 f If(t)12d#(t),
by (32) .
(33)
Thus one can conclude from (33) that 2t is also L2'2-bounded so that the stochastic integral -~f = f~ fdZt, f E L;(#) is defined. Note that {Zt, t ~ I} is not orthogonally valued and if T id., then (31) holds with inequality only. This interesting consequence is recorded for reference as follows: EXAMPLE b. Let {Zt, t C I} be a process with orthogonal increments, and T : L2(p) ~ LZ(P) be a bounded linear operator. Then Zt = T o Zt(E L2(P)) defines again an L2,2-bounded process and hence the stochastic integral ~f = f1 f d 2 t is well-defined relative to the process {Zt, t E I}. Taking T = H, an orthogonal projection, one gets the corresponding measure as a (weakly) harmonizable spectral measure, and thus such harmonizable processes satisfy our extended boundedness principle. [It is known that every harmonizable spectral measure has an orthogonal dilation and thus this is covered by the general principle.] For an
787
account of harmonizable processes, see e.g., Rao (1982) and a more extensive treatment is in the recent monograph by Kakihara (1997) including the multidimensional case. In the next example the integrand is also a stochastic process and is more general than either of the cases considered in (29), (32), or (33). EXAMPLE C. Let X = {Xt, Nt, t >_ 0} be a real right continuous square integrable martingale where {Nt, t _> 0} is a standard filtration (i.e., Nt = Ns>t N~, ~t ? c and completed in (~2, Z, P)). Let {~t, t > 0} be another standard filtration with { ~ t , t >_ 0} and { g t c Nt, t_> 0}. If ~ is the predictable a-algebra of N(R+) (o~ for the Nt-filtration, where N~ = a(mt>0 Nt), consider a simple function as:
/1
= EaiZA~Z(t~,t,+l], i--O
Ai C Jt~,
0 < tl < " " < tn+l <_ t < O0 .
Define as usual zf = fa+ f d X . Using the commutative property of conditional expectations (E~C'E ~' = E~'E ~s = E~C.~,s < t) and the identity E(E~(h)) = E(h), h C L 1(P), for any a-algebra N c Z, one can conclude that the martingale X is also L2,Z-bounded relative to ~ ( = (9, of Definition 11) and a a-finite measure/~ to be established below. Indeed consider:
n
E(l
fl 2) =
i=0
EkA5 %,
- X,,) 2]
+2 E
tl
aiajE[ZA,nAY (Xt+1--Xt')(Xty +z --)tj)]
O<<i<j<n
=
i=0
a ieE A (xL, - x # ) j + o,
using the martingale property and ~ t c f#t,

n
= E a i ~2 x [(li, ti__l) X A;]

i=0
where #x [(ti ' t~+l) A i] is a-additive (actually a measure) on ~ , associated with the positive right continuous sub martingale {X2, f#t, t > 0}, by Theorem 4. (This is called the Dol6ans-Dade measure.) Hence (33) implies
E(lzfL 2) = L +~ If(t, co)12 dgX(t, co)
(34)
Note that the dominating measure #x on ~ is not necessarily of product type. Thus r is a bounded linear mapping on the simple functions of (hence having a unique extension to all of) L2(#x) taking values in La(P), and so by definition {Xt, fqt, t >_ 0} is L2,Lbounded relative to ~ and #x. Then (zf) t = f ~ f ( s , .)dX, gives the desired integral, and it is easily verified that {(zf)t, fft, t >_ 0} is again a martingale for f c L2(~, y ) . If the Xt-process is a BM, then this is the usual
788
M.M.Rao Cours
It6-integral. (In this case, it is both a martingale and a Markov process.) The general form, for martingale integrators, of z f is due to Kunita and Watanabe (1967), and to Meyer in a series of articles with a final account in his major (1976). It is now appropriate to formulate stochastic integrals for semimartingales, extending and unifying the preceding examples, which again follows the boundedness principle (given in Definition 11).
EXAMPLE d. Let X = {Xt, fYt, t > 0} be a right continuous L2(p)-bounded (i.e., the process lies in some ball) semimartingale, {fft, t _> 0} being a standard filtration from (O, S, P), as above. If,~ is the predictable a-algebra from N(~+) N for the filtration, then X is L2,2-bounded relative to ~ and a a-finite measure on .~. This is verified by using the result in the preceding example, as follows. Since the Xt-process admits the decomposition Xt = Yt + Zt (cf., Definition 8), and X is L2(p)-bounded, it can be assumed for the present purposes that the martingale Y = {Yt, fft, t _> 0} is L2(p)-bounded and Z = {Zt, .%_, t _> 0} is a predictable (L2(p)-bounded) process of bounded variation where fgt- = a(Us<t f#s). Hence [/tl ---+ IZoo[a.e-, and also in L 1(P)-mean. Thus for simple functions of the type considered in the last example, one has zf defined and
E('~fI?)= E[ f~+f(t)dYt+ j[R+f(t)dZt21

_<2El + f ( t ) d Y t 2 +
(+[f(t),d[Zt,)21,
since (a + b) 2 _< 2(a a b2),
< 2 I+olfl2dl~X + E(lZool~+ lfled'Zt')]

by the last example, and Jensen's inequality, = 2[ 1/12[d/~"+ J~ +(2 = 2[ ]fl2dc~ Je +~Q where c~(.) is the measure defined by the term in [ ] above. This shows that X is L2,2-bounded, and hence (zf) t = is uniquely defined on L2(@ where c~(.) is a a-finite measure constructed above on ~. It also follows that {(zf)t, fqt, t > 0} is a semimartingale. The general theory implies that if the initial work is accomplished for the L 2,2bounded processes, then the integral can be extended with standared tricks, using stopping times and truncation, to local semimartingales and to locally integrable f relative to c~. The details are found in the paper [Rao (1993)] and the book [Rao (1995), Chapter VI]. Here we state as the main theorem that unifies the above examples and more, thereby indicating that the generalized Bochner boundedness principle is the best one in some sense.
IzooldlZ~l]
f~f(s)dX~
789
13. THEOREM. Suppose that X = {Xt, f#t,t >_ 0} is L~l'q2-bounded (cf., Definition 11) relative to the predictable a-algebra ~ c .~(~+) S with the standard filtration {Nt, t >_ 0} from the basic probability space (f2, X,P), and a a-finite measure c~ on ~. Then the stochastic integral
(zf)(t) =
f(s,.)dX=, f E L~(c~),
4~(2x) < K4l(X), x > 0
is defined and the dominated convergence theorem holds for this integral. On the other hand, if Xt ~ Le2(p),t > O,Le2(p) is a separable Orlicz space [a Banach function space that reduces to the Lebesgue space LP(P) i f (pg(X) = [xlP,p > 0], and if the stochastic integral z f exists, L ~2(P)-bounded for all simple functions f , and the dominated convergence assertion holds, then there exists a convex function ~o1 such that (ol(x) = 0 iff x = O, as well as ~l(X)~. oc as x T cc (not necessarily X ~o1(x) = [x[P), and a a-finite measure c~ on the associated (to the given standard filtration) predictable a-algebra, for which X is L e~'~2-bounded.
This result implies that the concept of generalized boundedness of Definition 11 is essentially an optimal condition if the stochastic integral is to obey the dominated convergence criterion which clearly is the most desirable property of the (stochastic or any other for that matter) integral. The direct part which is most relevant is simple and uses the same argument as in the above examples, but the converse direction uses results from functional analysis and is somewhat involved. However, it indicates the generality of the principle. Using this extended version, the corresponding It6 differential formula, a corner stone of the stochastic integration theory, will be given to use it often in the applications to financial mathematical models below. The presentation of the above formula is facilitated if we recall the concept of quadratic (co)variation of an LZ,2-bounded process (i.e., ( P l ( X ) = (P2(X)= X2). Thus for a process X = {Xt, Nt, t > 0}, consider the dyadic partition of [0, t] : 0 = to < tl < --. < t~ < t where t; = it~2~, i = 0, 1,... ,n (in applications X0 is usually a constant). Then [X]t = p l i m ~ + ~ x-'"-I z_~i=0 IX. v t,+~ - Xti) 2, if this (in probability) limit exists, is called the quadratic variation of the process X. It can be verified that this limit exists if X is L2,2-bounded relative to the predictable a-algebra ~ associated with the standard filtration {Nt, t >_ 0}. It may then be verified that t ~-+ [X]t is an increasing predictable locally bounded process. To see this, consider for a dyadic partition It of [0, t] given above with I~* = (t~, tz+ll and the following identity:
2" - 1 2=- 1
V-x0e
Z (XJ'2+~,lt -- X~t) 2
j=0
2.-1 2=-1
~7f
j=0
Z
j=0
(Aj:;X)2 + 2 ~-~X~A,:,X,
j=0
with A~2X = Xt,+, - Xti,

2=-1
;~oIAs,,IXI2 + 2
J~/t
f=(s)~= ,
(35)
790
=
M. M. Rao
X-~2"-1 ZI" wheref~(t) z~j=0 ~ t_~ j is a simple function. Since X is L2,2-bounded and fn is ~-measurable, it has a limit as n ~ oc. The right side integral converges to fz, Xs_dXs by the 'dominated convergence' for such integrals, so that the sum converges in measure to IX]t, the quadratic variation. Thus for the L2,Z-bounded (right continuous) processes the quadratic variation on [0, t] for each t < oc exists. Next if X i = {Xti,~t,t >_ 0}, i = 1,2 are two L2,Z-bounded processes, so that {Xt1 +Xt 2, ~t, t _> 0} is also such, with [X1]t, [X2]t, IX 1 +X2]t as their respective quadratic variations, then their quadratic covariation is given, with the polarization identity, by: [X1,X2]t = l ( [ x i +X2]t - [X1]t - [X2It),
t> 0 .
(36)
This exists and defines a predictable process of bounded variation on each compact interval [0, t], and is a bilinear form. With these concepts at hand, the following generalized It6 formula can be presented: 14. THEOREM. Let X = {Xt, Nt, t >_ 0} be an LZ,2-bounded right-continuous process with left limits relative to ~ and a ~-finite c~on ~ , as before. I f f : ~ -+ ~ is a twice continuously differentiable function, then for each 0 < t < oc,
f ( X t ) - f(Xo) = f'(X~_)dX~ + g + ~
1
1/oi
f"(X~_)d[X],
O<s<t
2 Z
(f(Xs) - f ( X , _ ) -f'(Xs_)AX~) Ut'(Xs)(AXs)2 '
O<s<_t
(37)
where the last two series are convergent a.e., AX~ = X~ - X ~ _ being the jump of the process at s. In particular, if t ~ Xt # a.e. continuous (especially for the B M ) , then the last two terms of (37) vanish, and then one has (the original Itd-formula) : f(Xt) - f(Xo) =
where [X]t = t for the BM. I f X 1 , X 2 are two LZ,X-bounded processes on the same standard filtration, with [X1,X 2] as its covariation process, g : N: ---* ~ has two continuous (partial) derivatives, then (37) takes the form:
//
f'(X~)dX~ + ~
'/o
~t
f'(Xs)d{X]~ ,
(38)
2i~=1 ~t~ig(xl X2 )dXi

1 2
+ 2..
~,j.~.l [
~2
c-g (X) ,X~ )d[X1,X2]~

-
Jo ax~axj
O<s<t
-- ~-~ ~X/2 aig (XsI "Xs2) AXi, i=1
(39)
791
the series converging a.e. (and the last two terms drop out if the X i are continuous).
The original formula (38) for the BM process was first obtained by It6 in the early 1950s, the extension to L2(P)-martingales by Kunita-Watanabe (1967), and the result for semimartingales with formula (37) is due to Dol6ans-Dade and Meyer (1970). The proof may be found in either of the above references, and some multidimensional extensions are also available [cf., M6tivier (1982)]. The stochastic integral and the (general) It6 formula are employed in the studies of stochastic differential equations (SDEs), and this is how important applications to, for instance, financial mathematics as well as the stochastic fundamental theorem of calculus, to be discussed below, are formulated. Thus one considers equations of the form: dXt = b(Xt, t) + a(Xt, t) dZt dt dt ' (40)
where b, a : R x ~+ ~ ~ are (locally) bounded measurable functions and where {Zt,-~t, t _> 0} is a semimartingale or an L2'Z-bounded process. However, the It6 equation (38) (or more generally (37), (39)) implies that the L2,2-bounded process does not have finite variation on non degenerate intervals so that dZt/dt is not defined in the ordinary sense of Lebesgue's theory. So (40) is formally written as
dXt = b(Xt, t)dt + a(Xt, t)dZt ,
(41)
and is understood in the weak sense, i.e., for any real continuous function ~p on with compact support one has:
qo(s)dXs = (p(s)b(Xs, s)ds + (p(s)cr(X~, s)dZs ,
(42)
where the left side is by definition, the right side in which the first is in the standard (pointwise) Lebesgue [or more generally the Bochner] integral and the second one is the stochastic integral relative to the L2,2-bounded process Z just defined above. The problem then is to find conditions on the coefficients b, a in order that (41) or (_42) has a unique solution {Xt, ~t, t >_ 0}. Taking cp(s) = s on [0, t] and writing b(x,s) = q)(s)b(x,s) and 6(x,s) = ~o(s)a(x,s) one can simply express (42) as:
X~ - Xo =
/0
[,(Xs, s)ds +
/0
#(Xs, s)dZ~ ,
(43)
and this is regarded as a first order (non linear) SDE, and is rigorously interpretable because of the stochastic calculus presented above. It is in this sense (and form) one understands an SDE given by (41). Moreover, if a solution of (43) exists, then it will be (at least locally) an L2,2-bounded process when b, a satisfy certain integrability conditions. Familiar examples of (40) are the following: (i) The Langevin equation for the motion of a free (Brownian) particle:
792 du
M. M. Rao
dt
flu+A(t)
(44)
where u is the velocity of the particle, -flu is the dynamical friction and A(t) is the random fluctuation. This is the 'white noise' or dB(t) = A(t)dt which gives the BM differential, [cf., Chandrasekhar (1943), p. 20]. It is an example of (40). (ii) Pricing of contingent claims in a stock market: dVt : [~Vt - D1 (Vt, t)]dt + a(Vt, t)dZt , (45)
where Vt denotes the price of a dividend-liability traded security at time t, a2(Vt, t) is the instantaneous variance rate, D1 is the dividend flow rate, and Zt is the BM fluctuation, [cf., Merton (1997)]. This is an example of (41). [Chandrasekhar and Merton are 1983 and 1997 Nobel lauriates in Physics and Economics respectively.] The first is a linear and the second a slightly more general (first order) SDEs to be understood in the form (42) and (43). Higher order SDEs are also of interest in some important applications as well as theory, but this is comparatively less developed. [Regarding the state of the art in the linear and nonlinear (higher order) cases, it is discussed in, for instance, by Rao (1997).] The above applications, especially in finance, are mostly considered for the linear equations. For this case a quite general existence and uniqueness of solutions can be presented and it is included here to give a bird's-eye view of the subject. 15. TH~ORZM. Consider the (linear) SDE given by."
k
dXt = (c~(t)Xt + ~o(t))dt + ~(fli(t)Xt + yi(t))dBi ,

i--1
(46)
where {BI, fgt, t > 0}f_l are LZ'2-bounded processes with the same standard filtration, the coefficients C~o,7,f l i ~ i ~ i = 1,... ,k being nonstochastic and continuous. Then (46) has a unique solution for any constant given initial value )20, and in fact the solution Xt, t >_ O, is explicitly given by." Xt = M~l {XO + ~otMsc~(s)ds
2z d 1 0
'/o'
Msfii(S)Ti(s)dB I, +
Io'
i 1
Msyi(s)dB~
(47)
where the strictly positive (LZ,2-bounded) process {Mr, ~ft, t >_ 0} is defined by
Mt = exp
I /o
-
c~(s)ds-
fii(s)dB~ .
(48)
+ g ,_
fii(s)flj(s)d[B',BJ],
793
Moreover the solution is a Markov process if the Bi are also independent with independent increments (in particular if they are B M processes).
This result easily extends if X is an n-vector process, ~, fli~ ~i, i = 1,... ,k are n x n matrices and c~0,B i are n-vectors, in which case Mt will be an LZ,Z-bounded process (or all of them can be regarded as semimartingales). [Indeed it also holds for higher order (linear) SDEs.] Taking k : 1 and n = 1 this applies to the problems noted in the above examples. A proof of this result is given (when B i are BMs) in Wu [(1985), p. 80] and a simpler version (in the L2,2-bounded case) in last reference above. (See Section 3, and the [non typist's] typographical errors should be corrected. The last part is a consequence of Theorem 4.2 there.) The simplifications obtain by converting the general It6 integrals into the Stratonovich form. This is of interest in some of these computations, and a definition will be included for understanding the distinction. If Yi, i = 1,2 are two L2,Z-bounded right continuous processes on the same standard filtration, then let Yt1 o dYt2 = YtldYt 2 +d[Y 1, ](2It so that in the integrated form one has
f0
y1 o dY2 =
ft
#dY 2 + ~
d [ Y ' , Y2]s .
(49)
The right side symbols are well-defined, by the preceding work, and the thus defined left side is called the Stratonovich integral which obeys the boundedness principle. This also follows the rules of the ordinary calculus, especially the integration by parts formula holds for it in contrast to (37). It is noted that if one of the processes yi is locally of bounded variation, then both integrals coincide, and this is useful in the analysis. As an illustration, let us note the solution of Langevin's equation (44): 16. Example. Let k = 1, c~(t) = -/~, a constant, c~0 = 0 =/~l and "~1 = 1 in (46) which reduces to (44). Then (48) becomes M(t) = e at and (47) becomes Xt = e -~t
+
,/0
0l)
e ~s dBs
This solution was derived (around (1905)) independently by Einstein and Smoluchowski with Bt as BM, long before the stochastic integration was rigorously established by Wiener in 1923. This description is enough for what follows. 5. An application to likelihood ratios An immediate application of martingale theory is to find likelihood ratios of a process X = {Xt, t ~ T = [a, b] C R} governed by a pair of probability measures (corresponding to a simple hypothesis versus a simple alternative). Thus let canonical (or function space) representation of the process be (O, S, ~) where f2 = R r and S is the cylinder a-algebra, i.e., t ~-+Xt(co)- co(t), co E O, is the
794
M. M. Rao
coordinate function, then X is the smallest a-algebra relative to which all the Xt are measurable. Suppose that t H Xt is right continuous. The problem is to find the likelihood ratio dQ/dP, based on a realization of the process if Q << P. First consider the special case of a sequence of random variables X~,X2,... and let ~ n -- a(X1,... ,X,n) -- (X1,... , X n ) - l ( ~ ~) C X, where N" is the Borel a-algebra of the Euclidean space R ~. If Q,, P~ are the restrictions of Q, P to ~-, and if Q < 1} is a positive martingale. To see the martingale property, let A C ~ , and consider Q~+I (A) = ~ f~+l dP~+l = .f4 E g ' (f~+l)dP~, by definition of conditioning,
=/if~dP,, = On(A)
by definition off~, (50)
This implies the martingale property. If Q < Q~(A) A E @~ one sees from the same computation that the fnn+l sequence will be a positive supermartingale. Consequently by Theorem 3.1 (or Corollary 3.4) f , ~ f ~ a.e. In the continuous parameter case one should replace the process by suitable countable sequences so that the above argument can be used. It is for this reason that we need some 'separability' conditions on the process, and the right continuity assumed above is convenient and then one of the two following methods can be employed. In many important cases one may replace the given process by a countable set of 'observable coordinates', proposed and with great effectiveness used by Grenander (1950), for which the martingale convergence theorem directly applies. The second method is to consider partitions ~ : a -- t~ < t]' < ..- < t],, _ 1} forms a (super) martingale. Here the refinement order is in general only partial not linear, and an extension of Theorem 3.1 is available but the convergence is unfortunately only in probability and further restrictions are needed for pointwise convergence. For a comparison, the exact result will be stated as follows. Recall that a set I is directed if it is partially ordered by ' < ' and if ~, fi E I then there is a 7 E I such that cq/3 < 7. Also a directed collection {f~,~ ~ I } is terminally uniformly integrable if Ilf~lll < K < oc, Vc~ E I and for each e > 0 there are ~0 c I, 60 > 0 such that fA If~dP <--~, VA E X, P(A) < 8o, and all ~ > c~0. The precise result is as follows: 1. THEOREM. Let ( f ~ , ~ , ~ E I} be a directed indexed martingale, so that E 5~ (fp) = f~ for c~< fi, which is terminally uniformly integrable. Then there exists
795
an f E LI(P) such that IIf~ - f i l l hence f~ --+ f in probability.
---* 0 as c~ --+ oo, and then L = E~=(f) a.e., and
[The proof can be found in, e.g., Rao (1981), Thm. IV.4.6 on p. 209. It may be noted that when the terminal uniform integrability holds, one has Q << P as a consequence.] Both these methods noted above will now be briefly illustrated. The first one is as follows. Let X = {X, a < t < b} be a real process on (~2, S, ~) with a continuous common covariance function r, means 0, m(-) for P, Q respectively and J2 Im(t)ldt < oo. Suppose {2~, (pn,n _> 1} are the eigen values and the corresponding eigen functions of the integral equation:
~o(t) = )~
/a
r(s, t)~o(s)ds ,
(51)
The classical theory of integral equations and Mercer's theorem imply that there exists a sequence of numbers 2n > 0, and functions (pn satisfying (51), forming a complete orthonormal set in L2([a, b], dr) such that
r(s, t) = ~ (&(s)ep(t) n=l )on
(52)
'
the series converging uniformly and absolutely. Let Z~ = fo Xtrp~(t)dt and m~ = fb a m(t)~o.(t)dt. Then Ep(Z.) = O, EQ(Z.) = m. and Cov(Zj,Zk) = (1/2k)aj~ under both P, Q. Moreover, (Ep denotes expectation on (~2, S,P)) one has
Xt = m(t) + ~-~ Zk ~o~(t___~)

k=l V/~
(53)
the series converging in mean under both measures P, Q (known as the Karhunen-Lo~ve representation). Now let Yn = ~(Z1,... ,Z~) and ,~oo = o-(Un @~). Then each Xt is ~-oo-adapted and it follows that ~oo = a(Xt, t [a, hi). Letting P~ = P I ~ , Qn = QIY,, to be the measures governing ( Z l , . . . , Zn) which are thus derived from those of the given measures, so that by (50) {f, = dQ~/dP,, , ~ , n >_ 1} is a positive super martingale and a martingale if Qn << P,,, n > 1, and f , -+ foo a.e. (= dQ~/dPoo by a theorem of Andersen-Jessen and independently Grenander, cf., e.g., Rao (2000), Theorem V.I.1). If Q << P is assumed then Q~ = Q~ << P,, 1 < n < oo and foo is the desired likelihood ratio of the process on ,,~oo. Let us illustrate this by the following example which specializes a result due to Grenander (1950).
2. Example. Let {Xt, t E [0, 1]} be a real Gaussian process with mean 0 (m(-)) under P(Q) and the same covariance under both, given by (V = max, A = min) r(s,t)=csh(sAt)csh(1-sVt)
sinh 1 ' 0<s, t< 1 " (54)
796
M. M. Rao mzt nzt, n~tdt, (do = .[~ m(t)dt,
A computation (by converting (51) into a differential equation) shows that "~n = 1 q-n2Tc 2, opt(t) = v ~ c o s n = 1 , 2 , . . . , (20 = 1, (P0(t) = 1) and set an = v/2 f0~re(t) cos dt, Z~ = v/2 J0' Xtcos n = 1,2,... Z0 f2 Xt dt where m E L 1([0, 1], dt)). It is seen that = 0, Varp(Z~) = l/ft, = VarQ(Z,) and the Zn are independent. So a standard computation shows that
Ep(Zn)
dQn (oo) = e x p Moreover ]a, I < j~
{ 2 l ~_12 2iai + ~.~=l ]LiZi(co) }

< ec, all n. If we define Yi =
(55)
Im(t)[dt = K
2i(Ziai-~),
then
OO
< oc, so that by the Kolmogorov two series theorem Y = ~i=1 Y// converges with probability one under both norms and one gets
~'~i=12ia~
OO
f-
dQ~ _ ey
dPoo
as the desired density. A difficulty here is the ability to calculate 2~, (p,, for a given covariance kernel r. A simple way of obtaining the observable sequences is crucial to implement this method. Some useful techniques for the purpose are given in Grenander (1950). The second method, as already observed, is to find a suitable partion of the index set [a, b] such that the refinements become dense in it. Some times, in particular cases, partions such as those based on dyadic rationals will give linear ordering and a.e. convergence can then be obtained. The following illustration from Pitcher (1959) explains this point as well as the method. 3. EXAMPLE. Let {Xt, t E [a, b]} be a Gaussian process on (~2, X, ~), in the canonical form, as above, with a common covariance function r, and means 0 and f respectively for P and Q. It is again desired to find the likelihood ratio for a general continuous r (not necessarily of the form (54)). Consider the random variables (a V ( - n ) ) _< -~ < b A n, k E N, and let ~-~ be the ~-algebra generated by these functions and Y ~ -- o-(U~ ~,,). It is clear that ~ T and Xt is ~ - a d a p t e d . Suppose that is a linearly independent (finite) set of X = (X~_~,kE N). Thus they have a (nondegenerate) kN-variate Gaussian distributic~'n with means 0 and a common covariance matrix R N with inverse A N = (AN), and determinant IRNI. Then for any ~ , - m e a s u r a b l e bounded function 9 : N" --+ R, one has
n --
(X<,...,XkN)
orf(ki),
I/2 / l ll
x exp
21RNI ij
1 ZANxixj}dx " Ai~f(ki)f(kj),
(56)
Let (p,(X) = [~NI v and C, = ]~NI g so that by the positive definiteness of R N, C~ > 0. By using a linear change of variables one gets:
Ai~XkJ(ky),
797
Ep(g(X) exp{q~n(X) - Cn})
_-...f
exp{
1 ,g(x)
2,1N,~A~(xi- f(ki))(xj- f(kj))}dx
(27c)@]RNI . . .
'
g(xl + f ( k l ) , . . . ,xn +f(kn)) dx Eo(g(X)) .
xexp{2IRI ijlNZANijXiXy}
= Ee(g(X + f ( k ) ) ) = dQ, gn(X) - dPn - exp{pn(X) - Cn} ,
(57)
Since g(X) is any @n-measurable bounded function, it follows that
and hence {gn(X), J n , n _> 1} is a positive martingale so that on(X) --' 0(X) a.e. [P]. Now let co be a point in a set of positive P-measure. Since the mean is zero for P, its finite dimensional distributions are invariant under the measure preserving mapping x ---+ -x, and hence exp[qG(-X(o0)) - Cn] = exp[-~o(X(o))) - Cn] ~ a finite limit = a(X) (say). Multiplying this and the result without the transformation gives a simplification e -c,, --+ ~(X)g(X), from which one finds that Cn ~ C > 0, (a constant). But then q)n(X) -+ (p(X) a.e., and hence dPo~ - e x p (p(X)-~C >0 , (58)
so that Qoo << Poe and the result (58) is the likelihood ratio. A further analysis shows that the same conclusion holds i f f is replaced by af, a E R and if Qa is the corresponding measure, then Qaeo << Poo, and moreover,
dQaoe-exp{a~p(X) - ~ C }
d-Poe
so that q0 is a linear function. Its explicit form can also be given. Thus the martingale convergence theorems come into play at a deeper level in these applications in computing the likelihood ratios of processes. (See the monographs of Grenander (1981) and Rao (2000) for other aspects.) We shall now proceed to a different set of important and novel applications.
6. Exponential (semi)martingales As seen in Theorem 15 of the preceding section, the solution process is generally an absolutely continuous process. As such one may ask whether the Fundamental
798
M. M. Rao
T h e o r e m of (Stochastic) Calculus is valid for the integrals defined here. This is nontrivial since the integrator is typically of u n b o u n d e d variation. It will be seen that such a statement has an i m p o r t a n t role to play in the study of linear SDEs, especially as it concerns the financial applications to be discussed in the next section. Consider the particular case of the S D E given in T h e o r e m 4.15 in which k = 1, s0 = 0, 71 = 0,/~1 -- o- > 0 and ~ = #, a constant so that one has: dXt =/rift dt + aXt dBt . (59)
An explicit solution can be written down immediately f r o m that theorem. However, let us express (59) in the integrated f o r m with the initial value X0 = 1 and # = 0, o- = 1. Thus it becomes: Xt = 1 +
/0'
X~_ dB, .
(60)
If Yt = Xt - 1 then (60) can be expressed as: Yt =
/0
X~_ d B , .
(61)
I f {B, t _> 0} is a BM, then (Bt+h - Bt)Z/h is a chi-squared r a n d o m variable with one degree of freedom, so it follows that:
P[(Bt+h -- Bt) 2 <_ ah] =
f0he -x2 &v = o(h),
h --+ 0 .
(62)
M o r e generally, suppose the distribution of the increments o f Bt satisfies the order of growth condition (62), so it holds if the process is stochastically continuous. Letting AhYt = Yt+h -- Yt and similarly for AhBt, one has:
AhYt AhB;-Xt-
1 ft+h - A~BtJt (Xs- - X t _ ) d B , = AhBtl~ (say) .
(63)
Assuming both the {Xt, Nt , t > 0} and {Bt, Nt, t _> 0} are square integrable processes with a standard filtration, and the Bt-process is also a martingale, one finds that {I~, -~t+h, t > 0} is a martingale. In fact, for 0 < s < t
t+h
(IL) -- E%+,, = +
ds+h
+ J +h E ~+h [E % (X._ - X~_)dBu]

t+h
)E%(dB.)]
= I~ + 0, a . e . , by the martingale p r o p e r t y of the Bt-process (so the increments are centered). Consequently, for c > 0, 3 > 0, one has

1 t 2
799
sup O<h<_5
>
_<
1 , ~ 1 [ t+h
(Xu_ - X ,
)2 d[B]u dP --~ O, as6-+0

. (64)
However for any k > 0 and e > 0
Now choose k > 0 large enough that the last term is o@) by assumption (62), and then choose 5 small enough that the first term on the right is small by (64). Then ( A h Y t / A h B t ) ~ X t in probability. Therefore we have the following result essentially due to Isaacson (1969): 1. PROPOSITION. (Fundamental theorem of (stochastic) calculus) Let {Bt,(fft, t > O} be a square integrable stochastically continuous martingale verifying (62), and {Xt, f#t-, t > O} be a square integrable right continuous process. I f the Yt-process is given by (60), then (AhYt/AhBt) --+ Xt- in probability as h ---+O. Following the analogy of the solution of the ODE dy = y dx or y = y0ex, the solution process Yt of Yt= 1 + Y~. dX,, or dYt=Yt_dXs , (65)
is termed an exponential of the Xt-process, denoted g(X)t. If the latter is a semi(or local)martingale, then g(X) inherits these properties. We can present its structure in the following explicit form, due essentially to Dol6ans-Dade (1970). One uses the known fact that every (semi)martingale can be decomposed as a sum of its continuous and discrete processes of the same type [cf., e.g., Rao (1995), p. 448]. 2. THEOREM. If { X t , ~ t , t > 0} is a semimartingale, and { Y t , ~ t , t >_ 0} is the unique solution of (65), with initial value Yo O, then one has: ~(X)t = Yt = Yo e x p { X t - X o - ~ [ X , X C l t } x H (1 + AXx)e -Ax" , (66)
0<s<t
where AXt = Xt - Xt- and if Xt = Xo + Yt + At, then Yt = Yc + yd and k = yc so that [yc yc]t is a continuous and (locally) bounded nondecreasing process. Moreover, the following properties hold." (a) if X is of finite variation a.e., or a local martingale, then so is F(X); (b) / f z = inf{t > 0 : AXt = - 1 } then (X)t # O(g(X)t_ # O) on the stochastic interval [[O,z) = {(t, oJ): 0 < t < z(o))}, ([[0, z]),"
800
M. M. Rao
(c) if X is a continuous local martingale, then so is g(X), and in fact, E(X)t =

z._~
n=O
with Xt() -=- 1, and for n > 1,

n! ,
yff/= n!
= n
dye,
dY,2..,
dXs.
nq) dX~, (say) ,
so that {Xt(n), J~t, t > O} is a continuous local martingale," (d) (Yor's formula) if {Xi, Y t , t > 0}, i = 1,2 are a pair of semimartingales with L IX I , X21 Jt as their covariation process (= 0 i f X 1,X 2 are independent), then
ozo(Xl)t~(Y'2)t = oxe(X 1 @ X 2 -}- [X1,X2])t, t _> 0 .
(67)
Finally, if X = X 1 + iX 2 is a complex semimartingale, (so xJ, j = 1,2 are real semimartingales) and Yt = Yt1 + iYt2 satisfies (65) which means one has the following pair of equations,
Yt1 = 1 +
yt2 =
/o'
ysl d..)(1_
/;
y? ~ 2
,
/o' Y?_ dXs 1 @ /oa~sI_ d.~2
then also all the above statements and properties ( a ) - ( d ) hold. For instance (66)
becornes
g(X)t = Yo{exp(Xt - Xo - "" 1-' . lc , X-2c]t)} IX lc , X tc~ Jt-i-~ [ X 2~, X 2c. ]t-l[X x E ((1 + AX,)e -Axs) , O<s<_t (68)
the (infinite) products in both (66) and (68) converge absolutely a.e.
The details, for instance, may be found in a compressed form in Rao [(1995), VI.6.2, pp. 530-531]. See also Mel'nikov (1996), and Jacod-Shiryaev (1987, p. 59). However formula (67) is not given in these places, and since it plays an important part in the applications of the next section, we shall discuss it here in a somewhat more general form. It depends on the extended It6 formula (cf., T h e o r e m 4.14, especially the two variable version (39) in the differential form). Let us restate it in a way that is used below. If f : ~22 --+ R is twice continuously differentiable, and X ~,X 2 are semimartingales on the same standard filtration, then dr(z121 ,Xa)t = ~
2 Of
i= 1
(X]_)dX/q- ~ i ~
1 2 .= 2
02f
.lc
-2c
(X , X )t_d[X , X ], (69)
+ ~ [Af(X1X2) t- _ ~ ~ Of ( x 1 x Z ) , _ ( A X / ) ] O<s<t i=1
801
Taking f ( u , v) = uv in (69), one has the integration by parts formula:

d(X1X2)t = ~t 1_ d X 2 q-X?_ dXt 1 q- d[X l,X2]t ,
(70)
or in integrated form
x t l x ? - x ~ x1 ~
2 =
dO
./o
IX 1 , X 2 ]Js , x~ - dX; + L
(71)
giving an extra term at the end in contrast to the classical Lebesgue case. With (69) or (70), we assert that the following more general form of (67) holds. Denote by o~H(X)t, the unique solution of the integral equation: Yt = Ht + Y~_ dXs , (72)
where {Ht, ~ t , t > 0} and {Xt, J t , t > 0} are semimartingales with the same filtration. If Ht = 1 a.e., then the earlier case results, and by Theorem 4.15, suitably interpreted, (72) has a unique solution Yt = gh'(X)t on the stochastic interval [0,~) where z = i n f { t : A X t = - l } , and in fact, the explicit solution is gl (X)t = oQX)t of the previous case, and if Ht = 0 a.e., then Yt = 0 is the only solution. Indeed if AXt - 1 , t >_ 0, one has the solution as: Yt = ~H(X)t = g(X)t Ho + where
z = ~ - [Hc,2~], ~ (1 + A X , ) - ~ A X ,
{ f0
O<s<_t
g(X)~_ 1 _dZ
(73)
So if Ht = I1o, t>> O, then Zt = Y0, and then Yt = oG(X)t, as desired. Here g(X)/1 = g ( - X * ) t where X[ = X t - [xc,2cl t - ~ (1 + AX,)-~(AX~.)2 , O<s<_t (74)
which is obtained by an application of ItS's formula for f ( x ) = X-1, X "> O. It may be noted that the covariation process of the semimartingales {Xt ~, ~ t , t >_ 0}, i = 1,2 may be computed using their continuous parts as:
iX 1 X2 ] = - .lc
Jt
[x
, x .2clt+
Ag~X~,
O<s<t
and the jumps need not be predictable. As a consequence, the covariation process {IX a,X2]t, g t , t > 0} need not be predictable, and this is why one has to pay more attention in special computations for a sharper analysis, although in the general aspects of integration, the LZ,2-boundedness implies that the integrals are welldefined.
802
M. M. Rao
With this discussion, formula (67) is easily established as follows. Let Ut = g(X)t, Vt = oQY)t be the unique solutions of(as usual we s e t & _ = 0 = Y0-)
Ut = Ht +
/0'
Us- dXs,
Vt = Kt +
J0'
V~_ d ~ ,
(75)
where Ht- and Kt-processes are semimartingales with the same filtration as are the Xt- and Yt-processes. Let {Lt, Wt, t _> 0} be the semimartingale defined by
Lt =
/0'
Us- dKs +
/0'
V~_ dHs + [14,Kit ,
t> 0 .
(76)
Then one asserts that
gn(X)tEK(Y)t = oQ(X + Y + IX, Y])t,
t >_ 0 .
(77)
I f / / t = 1 = Kt, t _> 0 so that Lt = 1 also, then (67) follows from (77). To verify (77), let U, V be as given, and one has with (70)
d(UV)t = Ut_ dVt + Vt_ dUt + d[U, V]t .
(78)
But Ut, Vt satisfy (75), so that with their differentiated forms, and the bilinearity of [-, .], we get
[U, VJt = [11,Kit +
U~_ dXs,
V~_ dY~
t
+dHt
Hence
/o
Us_ dY~ + dN
/o
V~_dX~.
d[U, V]t --- d[H,K]t + Ut_ Vt_d[X, Y]t + 0 + 0 , and substituting (75), (79) in (78) d(UV) t = Ut_ dKt + Vt dHt + d[H,K]~ + Ut_ Vt_ d(X + Y + IX, Y])t = tilt + Ut_ Vt_ d(X + Y + IX, Y])t
(79)
(80)
Since UV = o~;4(X)gx(Y), by definition, integrating (80) gives (77) as desired. In fact (67) also implies the expression for the reciprocal of ~ (X) indicated in (74), since g(0) = 1. For instance, taking X to be continuous for simplicity, so that X[ = Xt - [X,X]t, one finds
e(x),e(-x*), = ~ ( x - x + [ x , x ] + Ix, [x,x]] - [x,x])~
= C(0)t = 1 ,
(81)
since IX, A] = 0 for A of (locally) bounded variation. This formula holds even if X, Y are only right continuous; and then d~(X)[ 1 = d~(-X*)t where
803
Xt* = X t -
[Y,X] t - ~
(1 + AX~)-z(AX~) 2 .
O<s<t
These formulas find interesting application in studies of models of stocks and bonds, as we now turn to them in the next section. Some other properties of stochastic exponentials are discussed in Mel'nikov 0996).
7. Applications to financial market models

Suppose an investor purchases 'a' shares at time t for a price S, and sells at time t + h for St+h and realizes a capital gain of a(St+h - St). If in a period of [0, T 1 this is repeated at times 0 = to < tl < < tn = T with at, shares in (ti, tz+l], then the ISt~+l - St,), which in a continuous market operarealized capital gain is v'n-1 z_~i=0 a ti~ tion can be approximated by f~ at dSt. Since the evolution of stock prices {St, t > 0} (a risky asset which depends on chance) is a random process, the gain above is a stochastic integral. Also an investor typically owns some bank account or bonds (riskless asset) which initially is of value B0 and increases at an interest rate r > 0 so that at time t the value becomes Bt =- Boe rt or dBt -= rBt dt, I f the interest is variable and a set bt of bonds is owned, then the realized capital (continuous compounding) thus becomes f f bt dBt at time t as a Stieltjes integral. The applications here are mostly based on Shiryaev et al. (1994), and Mal'nikov (1996). See also the b o o k by Musiela and Rutkowski (1997). If the investor has b0 bonds at value B0 and a0 stocks at price So, then the initial capital is X0(= boBo + aoSo), and if at time t the investor holds bt bonds and at stocks, the pair rc = (at, bt) is called the trading portfolio (or strategy), and Xt = X [ = atSt + btBt is the current wealth. The security ~zis called self-financing if
X7=aoSo+boBo+
/0
a, d S , +
/0
budB,,
O<t<T
(82)
Generally the wealth X [ > 0, although at, bt can individually take negative values (at < 0 corresponds to selling of stock at time t but not delivering it until time T, and bt < 0 denotes borrowing at riskless interest rate r). Note that a self-financing strategy does not allow borrowing from the bonds and stocks at the same time. However {au, u _> 0}, {bu, u > 0} are assumed below to be (locally) of bounded variation, for practical reasons. The strategy 7c is said to admit an arbitrage opportunity at time T if X~ = 0, X~ >_ 0 and P[X[ > 01 > 0, so that ~z gives a possibility of riskless (arbitrarily large) profit. Some more terminology: Let ~z = {~zz,0 < t < T} be a self-financing strategy. Then for an initial investment capital x > 0, i.e., X0 ~ = x > 0, and a function f r ( = f(St, 0 < t < T)) >_ 0, the strategy is an (x,fr) - hedge ifX~- > f r , and ~ is a minimal hedge if there is equality here. Let B = {Bn, 1 < n < N}, S = {Sn, 1 < n < N}, and consider the Bond-Stock (or B,S)-market. Suppose that the participant can issue a security to the buyer the option to take back the stocks at time N at a fixed price K. Such a security is
804
M. M. Rao
termed the European call option. This means i f S N > K, the options owner can buy back the stocks at price K and sell at SN getting a profit of SN -- K and the hedge function fN = (Sx - K) +. If on the other hand, the expiration time is random in { 1 , 2 , . . . , N}, the corresponding security is termed an A m e r i c a n call option. These two options roughly correspond to fixed and sequential decision making (and the place names have little to do with geographic locations). There are also other options, but we do not consider any but the European case for our treatment since the general subject and applications are already made clear. Thus the problem is to find a (smooth) function f : [0, T] R + -+ ~ such that the corresponding capital X [ satisfies the boundary condition X [ = f ( T t, St); only the continuous market will be considered here. On the other hand, )~f (= X~) being the self-financing security, must also satisfy the SDE Xt - X0 =
f;
a, dS~ +
i0'
b~ dBu ,
(83)
as well as Xt = at& + btBt, or bt = (Xt - a t & ) / B t . I f f is continuously differentiable, once in the first and twice in the second variable, with partial derivatives denoted f ~ , f x , f c x for f ( s , x ) , one can use It6's formula and get another SDE for
x , ( : x[) as:
Xt - X o =
]o'
+ ~
fx(T - u, S~)dS~ -
fo'
,
f , ( T - u,S~)du
.
~2SZfxx(T - u , & ) d u
(84)
From (83) and (84), which agree for all t, so that the stochastic and the nonstochastic integrals must be the same, one has
/0'
a. dS. =
/0
f x ( T - u, Su)dS.
and since B~ = BoeTM or dBu = Bore TM du, the second parts become
io'
and
b.Bore TM du = ~ do S2fxx(T - u, S~)du -
io'
f~,(T - u, S~)du ,
or in differentiated form this gives
a, = ~ (r~X
t,s,),
(85)
0-2
b , n o r e r' = T s ~ L x ( r
- t,s,) -f,(r
- t,s,)
Substituting b,Bo = e - n ( X t - at&) (see the bt value given after (83)) and using (85) one gets the PDE:
805
a f a 2 a 2 f ( ~ f ) a s - -2 xa-6~x2 + r X ~ x - f
'
(86)
whenever (t,x) E [0, T] x (0, ~ ) with the boundary condition f(0,x) = ( x - K) + following from Xr = ( S T - K) +. This is a (not easy) parabolic PDE with the boundary condition given above. The solution of this PDE gives f and hence X[ = f ( T - t, St) as the desired capital, illustrating also a deep connection between the It6 formula and the PDEs. There is a probabilistic method of solving this equation which is based on the change of variables technique in an SDE, bringing in the exponential martingale analysis of the preceding section, which will now be sketched. That method also shows the essential role of the linear SDEs in these interesting financial applications. Let us specialize the expressions in Theorem 4.15 in which we take k = 1,~ = ~0 = fll = 0 and 7l~ = ~s,X0 = 1 so that the equation becomes Xt = 1 +
/0
X~7,dfi~ ,
(87)
where {fl~.,~ s , s > 0} is the BM. Taking 7s = 1, the unique solution of (87) is an exponential martingale given by
Xt = ~(fi)t = e~'-l[~'~]t = e~'-~ ,

and replacing fit by aflt ~a E [~, one gets for the corresponding (unique) solution
2
X, = g ( a fi ) t = ea/~'-~[/~'~]~
= e fadfi'-lf'oazds
(88)
If a = a~ defines a continuous function on N+, then (88) gives a well-defined process g(afi)t =);-t = ef~ asd~s-lfto a2ds , and if this is differentiated using It6's formula (cf., Theorem 4.14) one gets (89)
Xt = 1 +
/0
2sas dfi, ,
(90)
and this equation has a unique solution which therefore is given by the exponential (89). Here the fact that the process {fi~,J~,s >_ 0} is BM is crucial. However the new {-gt, ~-t, t > 0} and the original {Xt, ~ t , t >_ 0} processes have a close relationship which is clarified by the following result on change of equivalent measures P and/5 on (~2, Z), due to Girsanov (1960), and it is also useful in other applications.
806
M. M. Rao
1. THEOREM. Suppose that E(Xz) = 1 for the process given by (90) on (f2, Z,P) where T is the maturity time of option. Then the new process fit = f i t foasdfi~,t >_ O, defines {fit,@t,t > 0} as a B M on (f2, X,J~), where dPt = X t d P is an equivalent probability measure on ( O , ~ t ) and there is a unique fi on ~oo = a(Ut>o~t) such that Pt = f i l ~ t (which exists by Theorem 4.4 since the Xtprocess is evidently a continuous [right continuity is enough] class (DL)member). This result and some further extensions of Girsanov's work are detailed in Liptser and Shiryeav (1977, Vol. I, p. 323), and will not be discussed further. A specialization of the assertion to financial mathematics gives interesting applications, and also an alternative argument to the solution of (86) noted above. Recall that our basic model governing the stock market is
dSt=#Stdt+astdfit
t>O ,
(91)
where/~ c ~, and a > 0 the so-called volatility parameter, the chance fluctuations being the BM {fit, t > 0}. But now the latter can be changed with a Girsanov transformation into a new noise process /Tt by the above theorem wherein one takes a = s(it - r)/cr so that/Tt = fit + ((It - r)/a)t. Then (91) becomes
dSt = ItSt dt + ~St [dfit - I t - r d t ] = rStdt + aStdfit , t >_ O ,

(92)
and the new probability b on ~ t = a(fit, t _> 0) is determined (because of (88)) as: d/5 = exp ( ~ @ f fit - 1 ( ~ @ f ) 2 t ) d P . (93)
This change allows the elimination of the flee parameter It and brings in the interest rate r _> 0. On this new space (f2, Z,/5), the desired function f satisfying (86) can be obtained as follows. Note that on this new space {Dr, t z 0} is a BM, and this property is crucial in the calculations. Consider a new function f defined by f ( t , x ) = e-~tf(t,x). This rather unmotivated function may be thought of as a "discounted security" o f f for St, but will work for the problem at hand, It is 'suggested' by the classical Feynman-Kac method which expresses the solution of a PDE such as (86) as the expected value of a suitable functional of the BM and f above turns out to be one such. We apply It6's formula of two variables (Theorem 4.14 is the one dimensional version) as: t8 t~ Z(Xt, Y t ) - f(Xo, Yo)= L" ~x (Xs, Y~)dX~+.~ ~fy (X~,Y~)dG
1 i' e2d
J0 ~
d[X, Y], .
(94)
Substituting f ( T - t, St) = e-~tf(T - t, St) in the above and simplifying it after using (86), one finds on taking Xt as the BM in (94):
807
e r'f(T -
t, St) - f ( T ,
So) = a
e -~
(T
- u, S u ) S , d/~u .
(95)
[The integrand in (95) is just e-rUauSu.] Since ~ f / S x is assumed continuous and hence bounded on I0, t], the right side of (95) defines a martingale on (~2,Z,/5). Setting t = T and taking expectations one finds (since So = x is a constant and the right side is zero): f ( T , x ) = f ( T , So) = E~[e-~rf(0, Sr)] = E~[e-~r(Sr - K) +] , (96)
by the boundary condition f(O,x) = (x - K ) + which thus gives f by simplifying the right side of (96) using the fundamental law of probability together with the fact that/5 is determined by the BM {/~t, t _> 0}. After a nontrivial but standard manipulation of the Gaussian integral one finds [see Shiryaev et al. (1994), II, Section 4]: f ( T, x) = x~(g( T, x) ) - Ke-~r eb(h( T, x) ) , (97)
where ~b is the standard normal distribution whose density thus is given by '(x) = (27~)__1 ; and 9, h are found to be 2e-T,
~(r,x)= log2+
r+ T r/~v~;
h(r,x)=g(r,~)-~v~.
One can verify that the f of (97) indeed satisfies (86). Thus 7(V, S0) = E~[e-~r (ST -- K) +] is the rational amount to be invested with a self-financing strategy (a, b) where at = ( ~ f / S x ) ( T - t, St) and bt = f O r - t, St)/B.O < t < r , the {Bt, t > 0} being the bond asset. The above detailed account is included to motivate a general study of the market in which both stocks and bonds are allowed random fluctuations. Thus one begins with the (generalized) bonds and stocks as: Xt = )2o +
/0
Xs_ dMs,
Yt = Y0 +
/0
4-
Y~_d N s ,
(98)
where X0, I10 > 0, and M , N are semimartingales so that they can be uniquely expressed as: Mt
=
Mo + At + 2hit, Nt = No + Bt
ATt, t > 0 .
(99)
Here M0, No are finite random variables, {At, Bt, t >_ 0} are processes of locally finite variation, and {Mt, Nt, ~ t , t >_ 0} are locally square integrable martingales. Let ~ = (c~t,Wt, t >_ 0),~ = ( y t , ~ t , t _> 0) be adapted predictable processes of locally bounded variation representing the (generalized) bond and stock securities so that ~ = (~, 7) is the investor portfolio. The wealth of the investor is thus given by
808
M. M. Rao
zT=~,~+hh,
and it is self-financing if Z~ = z ~ +
t_>0,
(100)
/0'
c~, d,X, +
J0'
y~.dY, ,
(10l)
where the stochastic integrals are assumed to exist. The strategy admits an arbitrage opportunity if Z~=0,
Z~ > O, and P[Z[ > O] > O, t >>_O .
The rational (or fair) market problem is to find conditions so that there is no arbitrage. Here T is the time of maturity of the European security under consideration. A solution is obtained if one can find a probability measure /5 on (~2, X, Yt, t k O) such that ~st ~ Pt, (Pt = P I J ' t ) and {(Z~/Xt), J t , t _> O} is a martingale on (f2, 27, Yt, t _> 0,/5) since then one gets
\ x , / _- ~0
'
(102)
(expectation of a martingale being a constant) so that /ST[Z~ > 0] > 0 cannot occur if Z~ = 0. Note that /5 on ~(Ut>0J~t) exists iff the martingale { d ~ / d P , Yt, t > 0} is uniformly integrable on (~, 27,P) (analog of Corollary 3.2, and the details may be found in many books, e.g., Rao (1995), p. 279). Here it is sufficient to consider a compact interval [0, T]. Solutions of the problem for a large class of portfolios will now be discussed. The self-financing condition takes the form dZ~ = 0 so that (90) gives Xt d~t + Yt dyt = aZT/= 0 . By ItS's formula d(Z~
\x, j
=d
0q+Tt ~
= d~t +~TdTt + 7t_d = Z (Xt dcq + Yed~k) + 7t , d/" Yt'~ = 3t- ~XT)' by (102) . This in the integrated form becomes e2
_ 1 d(~
\XJ
z~
&
za+
I'
7,-
d(5"~
\xj
(103)
809
Thus the martingale property of {R~, ~ t , t > 0} relative to some/5 reduces to studying the same property of {Rt =N,~t,Y~ ~ t _> 0}, independent of the security re. The problem then is to show that the latter process is a (local) martingale for a large class of portfolios ~z relative to some such equivalent measure/5, which is often called a martingale measure. In the case of the BM such a measure was found in (93), and we generalize that procedure here. Let {Wtt,~ t , 0 < t < T} be a P-local martingale such that it satisfies (i) P[infte[0,r] Wt > 0] = 1, and Ep(WT) 1 where the filtration { ~ t , t > 13} is as usual standard. Define a measure/5 by the equation d/st = ~ dP. Then Pr N Pr, and (~,S,/5) is an equivalent probability space to the original one, and let dVt = ~2-_dWt or dW~ = Wt_dVt, or equivalently Wt -: W0 + f~ W~_ dV~. So the Vtprocess is a P-(local) semimartingale iff the Wt-process is, by the Girsanov theorem. This result may be used to state that our Rt-process is a/5-local martingle iff the (WR)t-process is a P-local martingale. But by the exponential martingale theory Wt Woe(V)t,(cf. (73) of Section 6). Substituting similar values ofXt, Yt in the definition of Rt one gets when AXe. 0,s C [0, T]:
=
--
Rt
--
=Roe
= R o ~ t ( M , N ) , (say) ,
M, _
(105)
as a consequence of Theorem 6.2 (and some algebraic simplification). But the Ot-process is a P and/5 semimartingale, as/5 ~ P. One finds Rt to be a solution of
Rt = Ro +
Rs dO~(M,N)
(106)
But it was already noted that the Rt-process is a (local) martingale iff VtRt is a P-(local) martingale, and Rt is given by (105). Thus the Rt-process has the desired property iff Rt W0e (V)t-process is a P-(local) martingale. But using (105) for Rt one gets
R t e ( v ) t : RoOt(M, N ) g ( V ) t = Roe(l~(M, N, V))t, (say) .
Here the new exponential ~ is obtained with Theorem 6.2 exactly as above. With this construction of /5 based on a positive Wt-process we have {Rt, ~ t , t >_ 0} to be a/5-(local) martingle, so that E~(R}) = R E = Z~/Xo, a constant. By the market model equations (98), the desired solution is Xr = X0g(A~r)r. But Z} >_f r and the fair (or rational) price for the investor is the minimal value, i.e., Z} = f t . Hence the equation becomes
810
M . M . Rao
\Xr)
= -1
= E p ( g ( - M * ) r f r ) , by (81) of Section 6 .
This may be summerized in the following:
(107)
2. TheOREM. For the general market model (98), and the measure/5 defined by the auxiliary process Wt, -15~ P, and if AX~ - 1, then the @-process is locally a Pmartingale implies that the Rt = Yt/Xt-process is a [~-martingale locally and a rational price solution of the (European type) model exists. The preceding argument contains the following simple, but interesting in itself, representation which will be stated for reference. 3. PROPOSITION. Let { X , Y t , t >_ 0} be a positive right continuous semimartingale on (0, Z,P) such that P[inftXt > O] = 1. Then it admits an (stochastic) integral representation: X~ =X0 + Xs_ dN~
relative to a semimartingale {Art, ~ t , t >_ 0} with a right continuous version, or equivalently, an exponential representation, Xt = Xo~(N) c In fact the N-process t 1 can be taken as d N = (Xt-) -1 dXt or N = fo~_ dX~.
This statement is in a sense converse to Proposition 6.1, and may be thought of as a simple analog of a vector Radon-Nikodl)m theorem. To amplify the above lengthy discussion, we now present a few examples, essentially adapted from Mel'nikov (1996).
4. Example. (Extended Black-Scholes model) The financial markets of equations (98) are now given by dBt=r(t)Btdt,
and
Bo>O, r>_O ,
dSt=-St(#(t) dt+a(t)dflt),
So>0, ~>0
where r , # , a are deterministic Borel functions satisfying J o 2 ( s ) d s < o o , for(s)ds < ~ , and Jo#(s)ds < oe. Take M , = Jor(s)ds, N s -- f~#(s)ds + Joa(s)ds, Xt = Bt, Yt = St, and we shall find a process Vt = fo~(s)dfis where fo~2(s)ds < c~ with which a /5 can be obtained. With such an ~(.), set Ot(M,N, V) = Jo(g(s) - r(s) + c~(s))ds + fo(cr(s) + c@))dfls , subject to ~(s) ~(s) r(s) Choosing/5 from the relation ~(s)
=
811
d/St : exp -
{ ~0ts~(~)-~(~) l~ot(/A(S)_ =_F(S)~ 2} ~-~5 d/~s _ ~ \ o(~) ) ds dP, ,
the integrand is seen to be uniformly integrable relative to P so that/st on J t defined extends to be a measure on Z and {t, ~ t, t > 0} is a local/5-martingale, and hence {Rt = ~ , ~,~t, t _> 0} is a local/5-martingale by Theorem 2, and a rational pricing strategy exists. This extends the original model since now e, #, r, a are time dependent satisfying the (local) integrability conditions.
5. Example. (Cox-Ross-Rubinstein model) This time the process is discrete. Again the market consists of a bond and a stock (B, S) satisfying the equations: AB,, = rB,, 1;
ASh =
p,S,-1;
Bo > O, So > 0 .
Here r = a fixed interest rate for B, and p,, n = l, 2 , . . . , is a sequence of i.i.d. Bernoulli random variables taking values a,b, where - 1 < a < r < b , with probabilities p and q. Thus p~ = ~ + b-a2~ where the i.i.d, random variables en satisfy P [ e n = l l = p , P [ e = - l ] = q , p + q = l . In the model to conform to Theorem 2, let Xt=B~, Y t = S ~ , W t = ~ n = a ( e l , . . . , e , , ) , n < t < n + l , n > 1, thereby embedding the discrete into a continuous process. Thus in our earlier notation
AMn = r,
ANn--
a+b b-a 2 f- - ~ - Sn,
A Vn = c~g,, ,
where V0 = 0, ~n = en - Do - q) and c~ is a constant to be chosen later. We then have
A~s~(M,N, V) = (1 r)-l[ANn - AMn -4-AVe(1 + AN,)] .

Substituting various values, one gets
ANn
--
a+b
2
b-a
+--~-~-q+~n]'
ANnAV~= [-a+b 2
b --~ a
Do - q)
l ~ f+ ~<
[1 - Do - q)2],
a~ +-b -a b-a A,/,,, = (1 + r ) -1 { [L - + -b -5 - Do- q) - r + ~<~--(1 - Do _q)2) 1 [b-a

+-~--+c~
(
1-4
a+b
2
b-a
2 Do-q)
)]
} e~ "
Now the free parameter ~ is chosen so that ~9 n is a P-martingale. The desired value is found to be
(a + b) + (b - a)Do - q) - 2,"
(b - a)(DO - q)~ - 1)
812
M. M. Rao
Then {Rt = g , ~~t , , s ', >_ 0} will be a /3-martingale if/3 is defined as: /3[gk = 1] =
2p(r - a)
(b - a)(1 + Co - q)) 2q(b - r) /3[ek = - 1 ] = ( b - a)(1 - ( p - q)) ' Thus the n-dimensional likelihood ratio is given by dP~ - I-I(1 + ~(ek - (p - q)))
k=l
I f p = q then/5[ek = 1] = (r - a ) / ( b - a),/3[ek = - 1 ] = (b - r ) / ( b - a). It is worthy of note that the fair price model (absence of arbitrage) depends only on the behavior of the martingale c o m p o n e n t of the semimartingale in the problem. Modifications to allow arbitrage (and so submartingale concepts enter) have been discussed in Musiela and Rutkowski (1997) together with several other models. Both the above examples can be combined to formulate as a multidimensional (here two) equations, and such generalizations have been discussed in the literature.
8. Remarks on multiparameter and other extensions
A generalization of the preceding martingale analysis for a multidimensional or a partially ordered index set, is not automatic, and in fact most of the results fail without additional restrictions. However, one can treat specialized index sets such as I = ~2+ with coordinate (or lexicographic) ordering. Here even the martingale concept has several avatars, and the following point of view gives sharp results and it is due to Cairoli and Walsh (1975). Thus for s = (SI, $2) , t = (tl, t2) E R2+, define the ordering s -~ t iff si < t i , i = 1,2 and a complementary order s Z t iff sl < t l but s2 _>t2. Let { ~ t , t ~ ~2+} be a family of o--subalgebras from (f2, Z , P ) satisfying the filtration conditions; (F1) : @s c ~ t for s -4 t, (F2) : ~ t is complete for P, t E D22, and (F3) : ~ s = Ut~-, ~ t , as in the single parameter case. Then an integrable adapted process {Xt, ~ t , t c ~2+} is a p l a n a r martingale ifs -~ t ~ E ~;s (Xt) --- X~ a.e. [P]. Let ~ 1 s = 0.(Us~> 0 ~ s l , s 2 ) and similarly 5", oZ'2 be defined. If Is, t] denotes the rectangle with diagonal from s = (sj,s2) to t - - (tl, t2) in N 2, then the increment of the process X ( s , t] = Xtm - X,1~.2 - X~lt2 + X ~ 2 can be used to define the following additional martingale concepts: (a) Xt is a w e a k martingale i f E s,' (X(s, t]) = 0 a.e., . . . . afXt as ~ a-~'i (b) an >martmgale c a d a p t e d a n d E Y ~(X(s, t]) = 0, i = 1,2 a.e., and (c) a strong martingale if E ~ ( g ; u J ; ) ( X ( s , t ] ) = 0, a.e. There is no good relationship between these concepts if the filtration is not further restricted, and not much detailed analysis is possible. In order to go forward, one imposes a fourth tech~ 1 and ~*t oz-2 are conditionally innical (not very intuitive) condition: (F4): ~*t dependent given ~ t for each t E N2+. Under these four conditions on the filtration,
813
it can be verified that {Xt, ~ t , t E R 2} is a planar martingale iff it is an i-martingale for i = 1,2 simultaneously. Using these conditions for this filtration and assuming that {Xt, t E R 2} is a Wiener-BM, i.e., a Gaussian random field with means zero and covariance E(XsXt) = ~2sl/~ tl. s2 A t2 where e2 > 0 is a constant (to be taken as ~-- 1 for convenience), a great deal of the theory of single parameter then extends, nontrivially. [It should be noted that there is another extension of the standard BM to multidimensions due to P. L6vy which is called the L6vy-BM. It is also a Gaussian field starting at the origin, mean zero, but the covariance E(XsXt) = [llsll + Iltll - IIs - till where Ils]t denotes the Euclidean norm of An. This also has interesting properties distinct from the Wiener-BM, with different applications.] For the (Wiener-)BM, the corresponding double and line stochastic integrals and related results have been extensively developed by Cairoli and Walsh (1975). These results naturally lead to certain analogs in harmonic function theory including the Stokes and Green theorems as well as stochastic partial differential equations. The area is in an intensive developing process of advanced analysis, and much can and should be done. The corresponding L%2-boundedness and a generalization of the BM-integration is a followup problem for study. Both L6vy- and Wiener-BMs satisfy an L 2,2boundedness condition locally. [A brief account of this analysis is in Rao (1995), Sections VI. 3 and VII. 3.] The Cairoli-Walsh theory has been extended to semimartingales in the plane by finding a suitable definition, since there are several such concepts, by Brennan (1979). He presented the initial spade work and a further detailed analysis of these fields has been recently obtained by Green (1997). [See also Dozzi (1989) for related subjects.] All these results and applications are of considerable interest and thus are areas for new research. Another line of enquiry is to stay with one dimensional time parameter, but let the process be vector valued. This area of research is being pursued by M6tivier (1982) and his associates when the range (or state) space of the process is infinite dimensional. Indeed even finite (> 1) dimensional work is appropriate in relation to higher order stochastic differential equations, both linear and nonlinear. To indicate the flavor, consider an nth order linear equation of the form:
Ln(D)f =
(a n d n-1 ) a o - ~ a l dt-~_l + . . . 4 - a n f ~-g .
(108)
This when f = Xt and 9 is the white noise, symbolically written as 9 = dfiffdt, has to be interpreted as:
qois) ~
ds =
/0
q0(s)d/~, ,
(109)
for all square integrable functions 9) on [0, t1, and (108) is taken in the integrated form
/0
qo(s)Ln(D)Xs ds =
/0'
q~(s)d]~ s ,
(110)
814
M. M. Rao
a well-defined quantity where {fit, t _> 0} is the standard BM so that the right side defines a martingale. This may be written symbolically as a first order SDE as follows: Let Yk = d k X t / d t k, and Y = (Y1,..., Y~)' be the column vector of (symbolic) derivatives, Bt = (0, 0 , . . . , fit) ~, and A be the n n-matrix given as: 0 0
n ~ . . .
1 0 ... 0)
0
.
1 0
1 an-2
...
".
0
"
0
-an
0
-an
...
.
1
al
so that dYt = AYt dt + dBt,

Yo=C,
(111)
which can be solved with a vector analog of Theorem 4.15. The solution Yt as a vector takes values in Rn, and it will be a (vector) semimartingale. One obtains the solution as
Yt = C + M ( t ) ~0 t M ( s ) 1 dB, ,
(112)
where M ( t ) is the fundamental n x n matrix solution of the associated homogeneous equation dYt = AYt dt. This problem has been analyzed by Dym (1966) and its sample function analysis has also been carried out. In this case, it turns out that the solution of (111) is a vector Markov process (and a martingale), but the scalar solution of (108) is neither. The analysis presents several novel features that are not present in the first order case. These problems are also of interest in physis [as originally discussed with the motion of a simple harmonic oscillator in Chandrasekhar (1943)], as well as in financial market models, with multidimensional Black-Sholes problem [cf., e.g., Musiela and Rutkowski (1997), p. 250] among others. However, there are several questions that have to be answered, and noncommutativity of matrices complicates the analysis. A detailed account of the subject as it stands at present is given in a recent paper by the author [cf., Rao (1997)]. Finally, it should be noted that, since the partial sum sequence of independent integrable random variables with zero means always forms a martingale, one can consider a generalization of the classical situation with martingale differences. These are not necessarily independent, but inherit several properties of the independent case. Consequently, a great deal of the results on the central limit problem, the law of the iterated logarithm, and their use in obtaining asymptotic properties of estimators of parameters of the underlying models can be found in the literature. An extensive treatment of related problems is given in the volume by Jacod and Shiryaev (1987). It is thus evident that martingale methods and results pervade large parts of analysis as well as concrete applications. Many books treating semimartingales are listed in the references, but there are many more works devoted to specializations, and the reader can easily find them from
815
the books and papers already listed here. These can be regarded as a representative set of a vast collection of works on this ever expanding subject.
References
Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. J. Political Economy 81, 637-657. Blackwell, D. (1946). On an equation of Wald. Ann. Math. Stat. 17, 84-87. Bochuer, S. (1955). Harmonic Analysis and the Theory of Probability. Univ. Calif. Press, Berkeley, CA. Bochner, S. (1956). Statiouarity, boundedness, ahnost periodicity of random valued functions. Proc. 3rd Berkeley Symp. Math. Stat. and Prob. 2, 7-27. Brennan, M. D. (1979). Planar semimartingales. J. Multivar. Anal. 9, 465-486. Burkholder, D. L. (1973). Distribution function inequalities for martingales. Ann. Prob. 1, 19-42. Cairoli, R. and J. B. Walsh (1975). Stochastic integrals in the plane. Acta. Math. 134, 111 183. Chandrasekhar, S. (1943). Stochastic problems in physics and astronomy. Rev. Mod. Physics 15, 1-89. Dol~ans-Dade, C. (1970). Quelques applications de la formule de changement de variables pour les semimartingales. Z. Wahrs, 16, 180-194. Dol~ans-Dade, C. and P. A. Meyer (1970). Integrales stochastiques par rapport aux martingales locales. Sem. d. Prob. IV, Springer Lect. Notes Math. 124, 77-107. Dellacherie, C. (1972). Capacites et Processes Stochastiques. Springer-Verlag, New York. Dellacherie, C. and P. A. Meyer (1980). Probabilitds et Potentiel, Partie B: Thdorie des Martingales. Hermann, Paris. Doob, J. L. (1953). Stochastic Processes. Wiley, New York. Dozzi, M. (1989). Stochastic Processes with a Multidimensional Parameter. Longmans Scientific and Wiley, New York. Dym, H. (1966). Stationary measures for the flow of a linear differential equation driven by white noise. Trans. Am. Math. Soc. 123, 13~164. Fefferman, C. L. (1971). Characterization of bounded mean oscillation. Bull. Am. Math. Soc. 77, 587-588. Fisk, D. L. (1965). Quasi-maringales. Trans. Am. Math. Soc. 120, 369 387. Garsia, A. M. (1973). Martingale Inequalities. Benjamin Inc., Reading, MA. Green, M. L. (1997). Planar stochastic integration relative to quasimartingales. In Real and Stochastic Analysis. CRC Press, New York, pp. 65-157. Grenander, U. (1950). Stochastic processes and statistical inference. Ark. Mat. 1, 195-277. Grenander, U. (1981). Abstract Inference. Wiley, New York. Isaacson, D. (1969). Stochastic integrals and derivatives. Ann. Math. Stat. 40, 1610 1616. It6, K. (1951). On a formula concerning stochastic differentials. Nagoya Math. J. 3, 55 65. Jacod, J. and A. N. Shiryaev (1987). Limit Theorems for Stochastic Processes. Springer-Verlag, New York. Kakihara, Y. (1997). Multidimensional Second Order Stochastic Processes. World Scientific Inc., Singapore. Kunita, H. and S. Watanabe (1967). On square integrable martingales. Nagoya Math. J. 30, 209-245. Liptser, R. S. and A. N. Shiryaev (1977). Statistics of Random Processes. Vols. L II, Springer-Verlag, New York. McKean, H. P. (1969). Stochastic Integrals. Academic Press Inc., New York. Mel'nikov, A. V. (1996). Stochastic differential equations: singularity of coefficients, regression models, and stochastic approximation. Russian Math. Surveys. 51, 819 909. M&ivier, M. (1982). Semimartingales. W. de Gruyter Inc., New York. Metron, R. C. (1997). On the role of Wiener process in finance theory and practice, the case of replicating portfolios, in The Legacy of Norbert Wiener: A centennial Syrup. Am. Math. Soc. Pure Math. Series 60, 209321.
816
M. M. Rao
Meyer, P. A. (1962/3). A decomposition theorem for super martingales: existence; uniqueness. Illinois J. Math. 6, 193-205; 7, 1-17. Musiela, M. and M. Rutkowski (1997). Maringale Methods in Financial Modelling. Springer-Verlag, New York. Orey, S. (1967). F-processes. Proc. 5th Berkeley Symp. Math. Stat. and Prob. 2, 301-313. Pitcher, T. S. (1959). Likelihood ratios for Gaussian processes. Ark. Mat. 4, 35-44. Rao, M. M. (1981). Foundations of Stochastic Analysis. Academic Press inc., New York. Rao, M. M. (1982). Harmonizable processes: structure theory. L'Enseign. Math. 28, 295-356. Rao, M. M. (1987). Measure Theory and Integration. Wiley-Interscience, New York. Rao, M. M. (1993). An approach to stochastic integration. In Multivariate Analysis: Future Directions. North-Holland, Amsterdam, The Netherlands, pp. 347374. Rao, M. M. (1995). Stochastic Processes." General Theory. Kluwer Academic Publishers, Dordrecht, The Netherlands. Rao, M. M. (1997). Higher order stochastic differential equations. In Real and Stochastic Analysis. CRC Press, New York, pp. 225 302. Rao, M. M. (2000). Stochastic Processes." Inference Theory. Kluwer Academic Publishers, Dordrecht, The Netherlands. Shiryaev, A. N., Yu. M. Kabanov, O. D. Kramkin and A. V. Mel'nikov (1994). Toward the theory of pricing of options of both European and American types, I. Discrete time; II. Continuous time. Theor. Prob. Appl. 39, 14~60; 61-102. Wu, R. (1985). Stochastic Differential Equations. Pitman Advanced Publication Program, Boston, MA.
"~
z , ...y
Markov Chains: Structure and Applications
R. L. Tweedie
1. Introduction
1.1. Markov chains and their state spaces

A M a r k o v chain is, in this article, a collection of random variables X = {Xn : n c T}, where T is a countable time-set. It is customary to write T as 7/+ := {0, 1,...}, and we will do this henceforth. The critical aspect of a M a r k o v chain, as opposed to any other set of random variables, is that it is forgetful of all but its most immediate past. Probabilistically this means that for a process X, evolving on a space and governed by an overall probability law P, to be a time-homogeneous M a r k o v chain, there must be a collection of "transition probabilities" {pn (x,A)} for each x c and appropriate A C X, such that for all times n, m in Z+
P(Xn+m C AIXr,r < m;X,, = x ) = Pn(x,A) ;
(1)
that is, P~ (x, A) denotes the probability that a chain at x will be in the set A after n steps, or transitions. The M a r k o v property is apparent in (1) because P~ does not depend on the values o f X r , r < m, and (1) also shows that the chain has a timehomogeneity property since P~ does not depend on m. The classical theory of M a r k o v chains originally dealt only with chains on finite or countable spaces. This has now become standard material for undergraduate and early graduate courses, and can be found in texts such as ~inlar (1975) or Taylor and Karlin (1994). In order to establish the fundamental aspects of M a r k o v chain theory on more general spaces, in Section 2 we first give an overview of results on countable spaces, including some recent and less familiar aspects even on these spaces, The theory on countable spaces is very complete. However, as we describe in Section 1.2, although m a n y real systems can be described in such a framework, m a n y other applications require continuous or even more general spaces, and there has been a flourishing of work on non-countable spaces, found in for example Revuz (1984), Nummelin (1984), and Meyn and Tweedie (1993a). In Section 3 we show how the countable space results extend to general state spaces,
817
818
R. L. Tweedie
and describe conditions which ensure that virtually the full range of the countable space structure is available in general. A great deal of the work in this article is based on the treatment in Meyn and Tweedie (1993a), and results we give without more detailed reference can be found there. As a further level of structure, there is increasing interest in chains on spaces which have a topological structure for which the transition laws are adapted, in the sense of being continuous in some way. In Section 4, we integrate the general state space results with topological conditions which guarantee that the "obvious" (but not always true) probabilistic properties are preserved when the space has topological features. In an article of this nature, it is inevitable that only certain perspectives on the theory and applications of Markov chains can be covered. We conclude, in Section 5, with a short description of some of the more important areas that we have not been able to consider in detail. This should give some directions for the reader who wishes to pursue other current areas of research.
1.2. Applicability o f Markov models
Before we discuss any theoretical properties of Markov chains, we briefly describe some of the applications of the theory, and attempt to give an idea of the vast range of situations in which Markov models are currently and traditionally used. (i) Markov chains have long been employed in classical "applied probability" contexts, including queues, storage models and inventory models. These methods have more recently been extended to cover network models that are of use especially in areas of teletraffic and computer systems modelling. Consider, for example, a queue at an airport. This evolves through the random arrival of customers and the service times they bring. The numbers in the queue, and the time the customer has to wait, are critical parameters for customer satisfaction, for waiting room design, for counter staffing (Asmussen, 1987). Under appropriate conditions, variables observed at arrival times or departure times of customers (such as the numbers in the queue, or the waiting times of customers) can be represented as a Markov chain, since the "forgetfulness" property holds at such times. Techniques arising from the analysis of such models have led to the now familiar single-line multi-server counters actually used in airports, banks and similar facilities, rather than the previous multi-line systems. Storage models are fundamental in engineering, insurance and business. In engineering one considers a dam, with input of random amounts at random times, and a steady withdrawal of water for irrigation or power usage. Similarly, in insurance, there is a steady inflow of premiums, and random outputs of claims at random times. This model is also a storage process, but with the input and output reversed when compared to the engineering version. In business, the inventory of a firm will act in a manner between these two models, with regular but sometimes also large irregular withdrawals, and irregular ordering or replacements, usually
Markov chains: Structure and applications
819
triggered by levels of stock reaching threshold values. For all of these, given appropriate assumptions, there is a Markovian representation. Extending this queueing context, telecommunications and computer networks have inherent Markovian representations (see Kelly (1979) for a very wide range of applications, both actual and potential). They may be composed of sundry connected queueing processes, with jobs completed at nodes, and messages routed between them; to summarise the past one may need a state space which is the product of many subspaces, including countable subspaces representing numbers in queues and buffers, uncountable subspaces representing unfinished service times or routing times, or numerous trivial 0-1 subspaces representing available slots or wait-states or busy servers. But by a suitable choice of state-space, and (as always) a choice of appropriate assumptions, Markovian methods give tools to analyse the stability of the system. (ii) One of the fastest growing areas of Markov chain use and research is in time series analysis, especially as the use of non-linear models becomes more common. As one example, the exchange rate Xn between two currencies is often represented as a function of its past several values Xn_l,... ,X~ ~, modified by the volatility of the market which is incorporated as a disturbance term W~ (see Krugman and Miller (1992) for models of such fluctuations). The autoregressive model
k
(2)
j=l
where W~is an i.i.d, set of "noise" variables, is central in time series analysis and captures the essential concept of such a system. Clearly {Xn} is not Markovian since it depends on several earlier values, However, by considering the whole k-length vector Yn--(Am,... ,Xn-~+l) (the "state-space representation"), Markovian methods can be brought to the analysis of this and many other time-series models (Brockwell and Davis, 1991). More general non-linear systems can be put into a similar state-space representation. Typically, representations are given in the form
x, = ,xn-k) + ,
where Wn is again an i.i.d, set of noise variables but f is a non-linear function. These cannot easily be analysed by traditional time series methods but the statespace representation is again Markovian. For details of such models see Meyn and Tweedie (1993a) and Tong (1990). (iii) Perhaps the most spectacular recent growth in the use of Markov chains is as a tool in simulation theory. Gibbs sampling, the Metropolis-Hastings algorithm, and other extensions to more general Markov chain Monte Carlo methods of simulation have had great impact on a number of areas (Gilks et al., 1996; Robert and Casella, 1999). In particular, the calculation of posterior Bayesian distributions has been revolutionised through this route (Smith and Gelfand 1992;
820
R. L. Tweedie
Tierney, 1994), and the behaviour of prior and posterior distributions on very general spaces such as spaces of likelihood measures themselves can be analysed using Markov chain methods. Finite or even countable spaces do not describe these systems in general. Integer or even real-valued models are sufficient only to analyse the simplest examples in many of these contexts. It is therefore fortunate that the results for countable chains can be carried over, with fairly simple and verifiable assumptions, to much more general spaces. One of the key factors we stress in this review is that, with the assumptions we consider, there is no great loss of power in going from a simple to a quite general space. The reader interested in any of the areas of application above should therefore find that the results for general Markov chains are potentially tools of great value, no matter what the situation, no matter how simple or complex the state space considered.
2. Countable space chains
2.1. Transition matrices and related quantities

When the state space X is countable, the evolution of the chain can be described by a matrix P = {P(i,j), i,j c X} which is called a Markov transition matrix. This satisfies
P(i,j) >0,
~
k~X
P(i,k) = 1, i, j E X
(3)
and has the interpretation P(i,j) = P(X~ = jlXn_~ = i). To make the concepts below more concrete, we introduce here specific models which we will use as examples throughout this section.
Example: The simple random walk and the Metropolis algorithm
A random walk is defined by the recursion

Yn = Y~-I + Wn (4)
where the G are i.i.d, with distribution F. Suppose that F ( 1 ) = 1/2 and F ( - 1 ) = 1/2: then the random walk is called simple (or Bernoulli), and it follows that for every integer i we have the transition matrix of Y, given by
P ( i , i - 1) = P ( i , i + 1) = 1/2 .
Many of the features of much more complex systems are easily explained for this simplest possible model.
821
Suppose now we have an arbitrary distribution ~ on the integers. A simple version of the Metropolis algorithm for simulating from rc can be described as follows: 1. F r o m the state i, attempt to make a simple r a n d o m walk move to a new state j (which will be either i - 1 or i + 1, with equal probabilities) 2. Accept the move if ~z0') _> ~z(i); otherwise accept with probability ~z~/)/~(i) and reject the move with probability 1 - rc(j)/z(i). 3. If the move is rejected, stay at i and repeat. For this model, the transition matrix is then given in terms of a ( i , j ) =
min(~(j)/~z(i), 1) by P(i,i+ l)=~(i,i+ l)/2; P ( i , i - 1 ) = ~ ( i , i - 1 ) / 2 ;

1)] .
P(i,i) = 1 - [ P ( i , i - 1 ) P ( i , i +
We shall see in Section 2.4 that this algorithm works because after n steps, for n large, the r a n d o m variables Xn have approximately the distribution ~. []
Example: The M/G/1 queueing model

As a more complex example, suppose we have a single server queue, and that customers arrive at the service point at times which form a Poisson process of rate )~: that is, the inter-arrival times are independent exponentially distributed rand o m variables with mean 1/2. Each customer brings a service time which has a distribution denoted by H(t), and these are also independent. N o w consider the r a n d o m variables N~, which count customers immediately after each service time ends in this system. At these times, it is intuitively clear that the process forgets the service history, by independence of the service times; and since the inter-arrival times are exponential, at any such time the process also forgets the arrival time history. Thus for this model, called the M/G/1 queue, the sequence {N~*,n > 0} can be constructed as a M a r k o v chain with state space Y+ and transition matrix qo p =
. .
ql qo
ql q q3 q41
q2 ql qo q3 q2 ql
".
q4 q3 q2
".
where for each j > 0
qj =-
{e-~'t(Xt)J/j!}H(dt)
(5)
This illustrates how the transition law may be built up by considering more fundamental characteristics of the dynamics of the chain. Note that apart from the transitions from 0, this chain is also a r a n d o m walk. []
822
R. L. Tweedie
For any one-step transition matrix P, we define the usual matrix iterates
P~ = {Pn(i,j), i,j E X} by setting p0 = I, the identity matrix, and then taking inductively P'~(i,k)= ~ j c x P(i,J)P~-~(J,k) To describe the evolution of the
chain in time, we take an initial distribution/~ and a transition matrix P, and for any j we can interpret P~ (Am = j) = ~
i
~(i)P ~(i, j)
where Pe represents the probability law of the whole process started with #. The matrix P" is called the n-step transition matrix. Three of the main structural questions we shall address below are (a) Which sets are visited from which starting points? (b) How often are sets visited from different starting points? (c) In the long term does the chain evolve to some steady state, in the sense that Pn(i,j) -+ rc(j') for some limiting distribution ~? To formalise these questions, for any state j we define the occupation time tlj as the number of visits by X to j after time zero, given by
OG
t/j := Z ~ { X n
n--1
=j}
and the first return time to j, given by "cj:=min{n> 1 :Xn=j} .
Analysis of X involves, in particular, ~o~=1P~(i,j) = E~[,/j] and
L(i,j) := P~(zj. < oo) = Pi(X ever reaches j)

and we now consider these for countable space chains.
(6)
2.2. Reducibility and irreducibility

The idea of a Markov chain X reaching particular sets or states is central in considering the structure of the process. For example, in the M/G/1 queue, the question of whether the queue ever empties or not is exactly the question of whether the state {0} can be reached. There are a number of essentially equivalent ways of defining the operation of communication between states. The simplest is to say that two distinct states i and j in X communicate, written i ~ j, when L(i,j) > 0 and L(/, i) > 0. By convention we also define i +-+ i. The relation i +-+j is often defined equivalently by requiring that there exists n(i,j)> 0 and m(j,i) >_ 0 such that Pn(i,j)> 0 and pm(j,i)> 0, or alternatively requiring n=0 ~ ,Y) > 0 and ~n=0 pn(/, i) > 0. It is easy to show that the relation "+-+" is an equivalence relation, and so the equivalence classes C(i) = {j : i +-+j} cover X, with i E C(i).
823
Chains for which all states communicate form the basis for future analysis, and if C(i) = for some i, then we say that (or the chain {Xn}) is irreducible. We also say C(i) is absorbing if PO', C(i)) = 1 for all j E C(i).
Example: The Metropolis algorithm

It is obvious that simple r a n d o m walk is irreducible on the integers, but that the Metropolis algorithm is irreducible on the integers if and only if the "target distribution" ~zis positive everywhere. If there is a value k where ~(k) = 0 then we can never reach k from any state i where ~(i) > 0, since all such moves will be rejected. The chain will move to the set S~ supporting 7c and will never leave that set, in this case, I f S~ is a connected set then the chain restricted to S~ is irreducible, of course, but if it is disconnected then the Metropolis algorithm m a y be reducible. []
Example: The M/G/1 queue

It is equally easy to show that the M/G/1 queue is always irreducible. Starting at state i we can reach any higher state j in one step with positive probability, as shown in (5). F r o m any state i we can reach any lower state j in i - j steps with , /--j probablhty q0 > O. Thus all states in this model communicate. [] The reducible situation can be analysed to a large extent using the structural results we shall describe for irreducible chains. When states do not all communicate, then it is possible that there are states not in C(i) which can be reached from i. This happens, of course, if and only if C(i) is not absorbing. Suppose that X is not irreducible for X. If we reorder the states according to the equivalence classes defined by the communication operation, and if we further order the classes with absorbing classes coming first, then we have a decomposition such that
X = ~ C(i) U D
iEI
(7)
where the sum is of disjoint sets, and each of the classes indexed by I is absorbing while none of the classes corresponding to the final set D is absorbing. N o w suppose that C is an absorbing communicating class. Let P c denote the matrix P restricted to the states in C. Then there exists an irreducible M a r k o v chain Xc whose state space is restricted to C and whose transition matrix is given by Pc. Thus for non-irreducible chains, we can analyse at least the absorbing subsets in the decomposition (7) as separate chains. The behaviour starting in D is often more complex, and we do not give details here.
2.3. Recurrence and transience

In this section we consider the number of times a chain returns to any given state, as the next logical step after considering whether it reaches such a state at all.
824
R. L. Tweedie
When X = Z+ and the chain is irreducible, any state i is called transient if Ei(t/i) = ~-~,n~1Pn(i, i) < oc, and recurrent if Ei(t/i) = oo. The following result gives a structural dichotomy which enables us to consider, not just the recurrence or transience of states, but of chains as a whole. PROPOSITION 2.1. When X is countable and X is irreducible, ~ _ I Pn(i,j) - oo for all i,j E X o r ~ n = l P (l,J) < O0 for all i,j E X.
04? ~/ ,
either
We define the chain itself as transient if every state is transient, and if every state is recurrent, the chain is also called recurrent; thus when X is irreducible, then either X is transient or X is recurrent. This is called a "solidarity property" of an irreducible chain. We can say, in the countable case, exactly what recurrence or transience means in terms of the return time probabilities L(i, i). Classical generating function arguments give PROPOSITION 2.2. For any i E X, ~ noo i) = OOif and only ilL(i, i) = 1. If the = l P n (~, chain is recurrent, then for every i,j E X we have L(i,j) = 1; and if the chain is transient then for every i E X we have L(i, i) < 1. F r o m this result it is not hard to prove the seemingly stronger result that for a recurrent chain, from any i the actual number of visits t/j to j is infinite with probability one: essentially, once the chain returns the first time with probability one it returns a second time with probability one, and so on. Thus the returns of a M a r k o v chain to a specific state have a property not shared by all r a n d o m variables: if the expected number of returns is infinite then the actual number of returns is infinite with probability one. Although we have the solidarity results above, verifying which of the two properties, recurrence or transience, holds for a given model is not trivial from these definitions. For the M/G/1 queue, for example, it is not simple to find L(0, 0), although it is of considerable practical importance to know whether this chain is recurrent or transient: the latter case says that there is a positive probability that the queue will never empty, and the former says that the chain is guaranteed to empty infinitely often. Because of the difficulty of checking the definitions of recurrence and transience directly, it is valuable to have verifiable conditions for these characteristics, and the most simple and general criteria are in the form of so-called "drift conditions". PROPOSITION 2.3. Let P be the transition matrix of an irreducible M a r k o v chain on Z+. (i) The chain is recurrent if and only if there is a non-negative function V(j), j E Z+ such that V(j) ---+oo,j ---, CxD, and a finite number N _~ 0 such that
O<3
P(i,j)v(j) < v(i),

0
i>N.
(8)
Markov chains." Structure and applications
825
(ii) The chain is transient if and only if there is a monotone increasing bounded non-negative function V(j),j E 2+, and a finite number N > 0 such that
O0
~-~P(i,j)V(j) > V(i),

o
i>N .
(9)
Finding suitable drift functions to satisfy (8) or (9) is not always trivial, and their choice depends on the structure of the chain. However, these conditions do have the advantage that they only involve the one-step transition probabilities of the chain, and so we do not need more complex calculations involving higher powers of the matrix P.
Example: The simple random walk

For this example, consider the function V(j) = IJl- We have immediately that for any i > 0
ZP(i,j)V(j) = (i- 1)/2+(i+1)/2=

J
V(i) ;
(10)
and symmetrically the same equality holds for negative i. Hence the chain is always recurrent. []

In this queueing example, we again consider V(j) = j. We have immediately that for i > 0
O(3 OO
Z P(i,j)[V(j) - V(i)] = Z qj i+1[./- i]

0 i 1
O0
= ~ qk[k- 1]
0
= f [ 2 t - 1]H(dt) =)~#-1 . (11)
where/~ is the mean service time. It follows that (8) holds provided 2 ~ < 1: that is, if the mean time between arrivals is no smaller than the mean service time. This result is well known in queuing theory, but this proof is short and shows that the whole queue is recurrent under a simple and intuitive condition. We need a little more structure to verify (9), but a typical bounded function that is often used in models such as queueing models is V(i) = 1 -/~i for some
826
R. L. Tweedie
fi < 1. This shows that if 2# > 1 the queue is transient, thus giving a complete classification in terms of the parameter 2#. []
2.4. Invariant vectors

For m a n y purposes, we might like the marginal distributions of Xn to be identical as n takes on different values. I f this is the case, then by the M a r k o v property it follows that the finite dimensional distributions of X are invariant under translation in time, and X is a stationary process. Such considerations lead us to the idea of invariant distributions, or invariant vectors. A vector ~ ( j ) , j E X with the property
~r(j') = ~ - ~ ( k ) P ( k , j ) ,
kcX
j CX
(12)
will be called invariant. If the invariant vector ~ is summable then we assume it is a probability distribution (that is, ~-~jre(/') = 1) and the chain is called positive (recurrent); otherwise it is called null. The key result concerning the existence of invariant vectors is PROPOSmON 2.4. If the chain X is irreducible and recurrent then it admits a unique (up to constant multiples) invariant vector ~. The invariant vector ~ is summable if and only if there exists a state k such that Ek[zkl < oo , in which case for every j, we have ~z(j) = 1/Ej[rj] > 0. Transient chains are always null, as is clear from (13), even though they may admit a unique invariant vector. Because of (13), we see that in terms of the properties of the chain, positivity is the next step after recurrence itself: in this case, we require the return times to each state to be finite in mean rather than just with probability one. Verifying that a chain is positive by checking the invariant equation (12) is difficult in general. Two cases where they can be solved explicitly are for the Metropolis algorithm, and for the M/G/1 queue. (13)

It is a simple calculation to see that for the Metropolis algorithm, for any i,j we have
7z(j)P(]', i) = ~c(i)P(i,j)
(14)
and so by summing both sides over i we have that ~ is the invariant distribution for the Metropolis algorithm. Hence this algorithm is always positive recurrent. []
827
Chains satisfying the "detailed balance" condition (14) are often called reversible. Reversible chains have a number of properties not shared with all Markov chains: see Kelly (1979) for many details and examples.
Example: The M/G/I queue

In this example the value of re(j) is determined iteratively (and uniquely) from the previous values, using the equations for j = 1 , 2 , . . . , and so there is always an invariant vector. The question of summability of the solution can then be resolved from the equation for j = 0, and this shows that the requirement is that 2/1 < 1: in other words, only on the boundary 2/~ = 1 are recurrent chains not positive. This behaviour is typical of many recurrent chains: it is common to discover that for a range of parameters (in this case, 2# _< 1) we have recurrence, and only in the boundary cases do we see null recurrence. [] When we do not have the degree of structure in this example, we need another method of checking positive recurrence. The most useful is another drift condition. PROPOSITION 2.5. Let P be the transition matrix of an irreducible Markov chain on 7/+. The chain is positive recurrent if and only if there is a non-negative function V ( j ) , j 6 7/+ and finite numbers N >>_0, e > 0 such that
DO
Z
0
DO
P ( i , j ) V ( j ) <_ V(i) - e,
 N;
(15)
i < N
i
~-~ P ( i , j ) V ( j ) o
The drift condition (15) is often known in the operations research literature as Foster's Condition, and following the dynamic systems literature the drift function V is also often called a stochastic Lyapunov function.

The Eq. (15) is clearly closely related to (8). By more closely scrutinising (11) in the M/G/1 example, we see that we have proved that (15) is satisfied using V(i) =- i, provided that we have 2# < 1. This is the same condition for summability of z that we found using the direct evaluation of the invariant equations above. []
2.5. Convergence to
A key result for positive recurrent irreducible chains is that the transition laws converge, in some suitable sense, to the invariant vector z. The classical result is
828
R. L. Tweedie
PROPOSITION 2.6. For an irreducible positive recurrent chain, for any i and j in X
n
n -1
Zpm(i,j)
m=l
~ re(j),
n ---+ oc .
(16)
The use of the Cesaro averages in (16) can be avoided if we have an aperiodic chain. The simplest definition of aperiodicity is that a state k is aperiodic if Pn(k, k) > 0 for all sufficiently large n; irreducibility can be used to show that if one state is aperiodic then all states are aperiodic, which is a further solidarity property of such chains. The most useful condition for aperiodicity is that a state k is aperiodic if P(k, k) > 0, which clearly implies P"(k, k) > 0 for all n. The traditional pointwise convergence of transition probabilities in (16) has been replaced in more recent research by convergence in total variation. For any n let us define the total variation norm between P"(i, .) and ~ by
Hpn(i, .) - ~zll = 2 ~
J
]P"(i,j) - z~(j)] .
The convergence result that then holds is PROPOSITION 2.7. For an irreducible aperiodic positive recurrent chain, with invariant distribution To, [[Pn(i, ") - ~ll --+ 0, for any i C X. The p r o o f of this result, and indeed the focus on convergence in total variation norm, follows because of the use of coupling techniques Lindvall (1992). We do not pursue the details of this here, but we do note that the pathwise constructions used represent a completely different direction to the earlier analytic approaches to M a r k o v chain convergence results. A relatively newer class of convergence results for M a r k o v chains concerns the convergence of the generalised moments of the n-step transition laws. PROPOSITION 2.8. For an irreducible aperiodic positive recurrent chain with invariant distribution re, and any function f such that ~ j ]f(j)tzc(j') < oc, n --+ ec (17)
~Pn(i,j)f(l')J
~rc(])f(l')
J
~ O,
n --+ oc ,
(18)
for any i C X. The result here is perhaps surprisingly strong: it is certainly not the case that for an arbitrary sequence of random variables, the mere existence of the limiting moment guarantees such convergence. Verifying that the invariant vector actually has a finite moment of a specific form is not trivial. Once again we have a drift condition that enables us to check this.
Markov chains. Structure and applications
829
PROPOSITION 2.9. Let P be the transition matrix of an irreducible Markov chain on Z+, The chain is positive recurrent with an invariant probability vector satisfying ~ j f ( j ) ~ z ( j ) < oc for a non-negative function f if and only if there is a non-negative function V(j),j E Y_+ such that
OO
ZP(i,j)V(j)
0
O0
<_ V(i) - f ( i ) ,
i > N;
(19)
~-~P(i,j)V(I') < b < oo,

o
i <_ N
Notice that Foster's Condition (15) is merely (19) in the special case f = e. In this context (15) can be seen as a condition for rc to be summable if the chain is known to be recurrent; and this follows also from (15) if V --+ oc, since then (8) holds.

It is possible to analyse the M/G/1 queue using Proposition 2.9 provided we have 2# < 1 (so that (15) holds). By using the function V(j) = jr+l it can be shown that ~ y rc(j)f < ec, and then it follows from (18) that
Z. Pn(ij)Jr - ~
J J
=(l')Jr ---+O,
n -+ oo ;
in other words moments of all orders converge. Even more strongly, by using the function V(j) = flJ for an appropriate fl > 1, we similarly get that the moment generating function itself converges for this chain. []
2.6. Geometric ergodicity and rates of convergence

The logical question that follows when we have established (17) is: how fast is this convergence? Much current work has concentrated on this question. One area of such effort involves the so-called "cut-off'' phenomenon: for finite chains on structured spaces (where typically group theoretic properties can be exploited), Diaconis and various others (Diaconis (1988), Aldous and Diaconis (1986), Aldous and Diaconis (1987), Diaconis (1995)) have explored this in detail, showing that in such circumstances the chain may stay far from stationarity for a number of steps and then suddenly achieve an approximate stationary distribution. We will not explore this further here, as it requires tools and structures rather different from most of our other results. For more general chains on less structured spaces, there has been considerable research into questions of geometric convergence, and the following result shows that any degree of geometric convergence leads to an overall uniform geometric rate.
830
R. L. Tweedie
PROPOSITION 2.10. Suppose that, for an irreducible aperiodic positive recurrent chain, there exists a state k and constants Mk < oc, Pk < 1 such that for all n
IP~(k, k) - =(k)l < Mkpg .
Then there exists p < 1 and Mi < cx) such that for any i E X
IIP~(/,') - =[I -< ~ p ~ , n > 1 .
(20)
Recall that positive recurrence is characterised by finite mean return times to individual states, as in Proposition 2.4. We have a similar characterisation of geometric ergodicity. PROPOSITION 2.11. An irreducible aperiodic positive recurrent chain is geometrically ergodic if and only if for one, and then every, state k
Ek[fi **] < cx)

for some fi > 1. The following drift condition also characterises geometric ergodicity.
(21)
PROPOSITION 2.12. An irreducible aperiodic positive recurrent chain is geometrically ergodic if and only if we can find a function V(j) > 1, an N > 0 and 2<l,b<~suchthat
oo
Z P(i,j)V(J') < ZV(i),

0
O<5
i > N;
(22)
~-~ P(i,j)V(j) 1
(23)
for some M < e~, p < 1. The result here is very strong. From the drift condition (22) we not only find that the "moments of order up to V" converge at a uniformly strong geometric rate, but also that the constant depending on the starting point is described by the same function V(i).
Example: The M/G[1 queue

Checking the condition (22) for this queueing model is not difficult. The appropriate drift function is V(i) = fli for some appropriate fi > 1. This shows that for this qneueing model, not only all moments, but also the moment generating
8 31
functions of the n-step transitions, converge geometrically quickly at a uniform rate. []

Again by using the drift function V(i) = fii for some appropriate fi > 1 in (22), we find that the Metropolis algorithm also converges geometrically quickly provided the tails of 7c go to zero smoothly and at a geometric rate (Mengersen and Tweedie, 1996), and this is also shown to be (essentially) necessary for geometric convergence of the algorithm. [] A central question in this area which has been recently solved is the determination of a bound on the rate of convergence p in (20) or (23). Typically the answers to this are complex. The simplest result holds in the case where the chain is s t o c h a s t i c a l l y m o n o t o n e : that is, where for each fixed k E X
i >_ j ~ P(i, {k,k 1,...)) > P(j, {k,k + 1,...}) .
In this case we find (Lund and Tweedie, 1996) the following elegant result in terms of the drift condition in Proposition 2.12. PROPOSITION 2.13. For an irreducible aperiodic stochastically monotone chain satisfying (22) with N = 0, we have
I[P"(i, .) - rc[] ( M i 2 n .
For more general chains, satisfying (22) with N = 0, say, there are rather more complicated bounds on p, but these still depend only on ;%b and the value of P(0, 0). The most accurate result to date (Roberts and Tweedie, 1999) is that the rate of convergence is bounded below by fi* where, writing d = [b - P(0, 0)]/2, (a) I f J < 1 then fl* = 2-I; (b) If J > 1 then log 2 )-i fi* = exP(log J/log(l~-~P(0, 0 ) ) - 1) -< (24)
This can be extended to bounds when (22) holds outside some larger set of states. Typically, these bounds are rather coarse, and do not always provide practical computational values (Roberts and Tweedie, 1999); but they do hold in complete generality.
3. General state space chains

3.1. P r e l i m i n a r y s t r u c t u r e s
We next consider the situation when the state space is general rather than countable. It will be seen that the results above for countable chains have,
832
R. L. Tweedie
virtually item by item, parallels for such general chains, and we deliberately structure this review so the similarities are clear. We let X be an arbitrary set, and simply require that it is equipped with a countably generated a-field ~ . For many results the countable generation assumption can be relaxed (Orey, 1971), but it is rarely onerous in practice in any case. In this context, the evolution of the chain is described by a Markov transition law {P(x,A),x E X,A ~ ~ } , which replaces the transition matrix. Here the formal assumption is that P(x,A) is a probability measure on ~ for each fixed x, as in (3), and a measurable function of x for each fixed A C .~. In the general state space case we define the n-step transition law P~ = {Pn(x,A),x E X,A E N'} inductively by P~ (x, A) = fx P(x, dy)P ~-1 (y, A). To describe the evolution of the chain in time, we take an initial measure/~ on N and a transition law P: then for any A and n,
P#(Xn E A) = fx #(dY)P~(Y'A)
where Pu represents the probability law of the whole process started with #. Again we introduce specific models which we will use as examples throughout the general state space results.
Example: The autoregressive model

We use the simplest time-series model described by (2). The first order autoregression on N is defined iteratively by X~ = 7Xn_l + en (25)
where the constant e is a real number and e~ form an i.i.d, sequence (the "noise process"). N o w if F(A) = P(e, C A) is the probability measure associated with the noise process, the transition law of X~ is P(x,A) = P ( ~ + en E A) = F(A - c,:c). Thus the law P can be formed from the law of the noise process in a simple way. []
Example: The Metropolis-Hastings algorithm

The Metropolis-Hastings algorithm is a generalisation of the Metropolis algorithm introduced in the countable case. Assume we have a given measure ~ on .~. Take a fixed measure v with support larger than that of ~z, and first consider a candidate transition law with densities q(x,y), x, y ~ , which generates potential transitions for a discrete time M a r k o v chain evolving on X. (Here all the densities are with respect to v.) In the simple Metropolis case in the last section, the candidate was the r a n d o m walk on the integers, but in fact q(x,y) can be arbitrary. A "candidate transition" to y generated according to the density q(x, .) is then accepted with probability c~(x,y), given by
833
{ m i n Lr~(x) ~(y)q{y'x) q(x,y)' 1}

: 1
~z(x)q(x,y) > 0
: o.
(26)
Thus actual transitions of the Metropolis-Hastings chain take place according to a law P with transition probability densities
p(x,y) = q(x,y)c~(x,y),
y x
(27)
and with probability of remaining at the same point given by
r(x) = P(x, {x}) = f q(x,y)[1 - ~(x,y)]dy .
(28)
Hence we have described P(x,A) = fAP(X,y)v(dy)+ r(x)l (x E A) in terms of the structure of the chain as required. [] For any set A we again define the occupation time rlA as the number of visits by X to A after time zero, given by qA :-- ~ a ll{Xn C A}; and the first return time to A is ~A := min{n > 1 : X,, C A}. Again we have ~ _ _ j P"(x,A) = Ex[t/A] and we write
L(x,A) := Px(*A < e~) = Px(X ever enters A) .
(29)
We now show that these have properties similar to those of countable space chains.
3.2. Reducibility and irreducibility

As for countable spaces, to develop a coherent theory it is almost imperative to have a suitable concept of irreducibility. However, we can no longer consider the situation where distinct states x and y in X communicate in the sense that L(x,y) > 0 and L(y,x) > 0, since typically there is zero probability of reaching single states, Instead, we introduce the concept of p-irreducibility, where we assume that there exists an "irreducibility" measure (p on N such that for every state x
(p(A) > 0 ~ L(x,A) > 0 .
(30)
Thus the measure (p describes the "big" sets that are reached from every starting point in the space. This seemingly weak condition will be shown to provide just enough structure for the results we will discuss. Note that if the space is countable, then we can take (p as counting measure and we get (p-irreducibility; but indeed we only need there to be one state (say {0}) such that for all x we have L(x, {0}) > 0 and we still get (p-irreducibility, for the choice of (p = 60, the Dirac measure concentrated at {0}.

The autoregressive model is (p-irreducible provided the noise distribution has an everywhere positive density with respect to Lebesgue m e a s u r e /ALeb. If we take
834
R. L. Tweedie
F(A - ~x) > 0, and so P(x,A) > 0 in just one step.
qo =/~I~b, it is easy to see that whenever #~eb(A) > 0 then also for any x we have []

There are various sufficient conditions for the Metropolis-Hastings algorithm to be (p-irreducible (Roberts and Tweedie, 1996). It is not hard to see that, as in the autoregressive model above, sufficient conditions for v-irreducibility (where v is the measure for which all relevant densities are defined) are given by the requirement that re(x) and all of the candidates q(x,y) are positive and finite for v-almost all x,y. Weaker conditions are possible, but as always when checking irreducibility, it can become complicated when we need to verify more than "onestep" probabilities of reaching all sets. [] Since we lack the equivalence relation structure that we can define on a countable space, it is much harder to find clean results in the non-irreducible situation when the space is general. Considerable research has been done into conditions for the existence of a countable decomposition
x= c uD
iEI
(31)
where the countable sum is of disjoint absorbing sets (that is, P(x, Cj) = 1 for all x c Cj) which are in some sense "recurrent", while the set D is in some sense "transient". Such descriptions of the space are known as Doeblin decompositions. Perhaps the most simple and verifiable conditions leading to a Doeblin decomposition, where the recurrence and transience behaviours of the C; and D are clearly defined, are those given in Section 4 below. Two very general conditions which lead to the existence of Doeblin decompositions are (~{) There exists a finite measure giving positive mass to each absorbing subset of X. (cg) There is no uncountable disjoint class of absorbing subsets of X. These assumptions by themselves seem relatively weak, especially (cg), which is clearly necessary for the recurrent part of(31). It has recently been shown (Chen and Tweedie, 1997) that the conditions (J/) and (cg) are equivalent, and are themselves equivalent to the existence of the decomposition (31): and thus even on a general space there is at most a countable recurrent structure under such conditions.
3.3. Maximal measures and small sets

Even though p-irreducibility is not a strong condition to verify in many cases, it can be shown that when it holds there is a canonical "largest set of states" that can be reached from all points. The results in this section are not ones that have
835
counterparts on a countable space, and it is here that the effectiveness of (p-irreducibility is first seen clearly. PROPOSITION 3.1. I f the chain is p-irreducible then there exists an essentially unique " m a x i m a l " irreducibility measure 0 on N such that (i) for every state x we have L(x,A) > 0 whenever 0(A) > 0, and also (ii) if 0(A) = 0, then 0(A) = 0, where A : = {y :L(y,A) > 0} is the set of points from which we can reach A with positive probability. We let ~ + denote the collection of sets such that O(A) > 0. The uniqueness of the maximal measure ensures that N+ is well-defined. The benefit of the property (ii) is that, if we have a 0-null set A outside which some desirable property holds, then we can delete both A and A, and then since = A, the remainder of the space is a stochastically closed set on which the desirable property holds everywhere. Thus we can restrict P to this " g o o d " set without real loss of generality, Perhaps the most critical consequence of 0-irreducibility is that we can define a class of sets, on a quite general space, which behave the same way in m a n y contexts as do individual states in a countable space. We shall see this in m a n y of the results below, which almost exactly mimic the countable space results they generalise. These sets are called small sets. A set C in N is called small if there exists an m > 1, a constant 6 > 0 and a probability measure q~ such that, for every x E C, and every B E N,
pm(x,B) >_ 6(o(B) .
(32)
When a small set is in B +, then it is obvious that q0 is an irreducibility measure, hence the notation that we adopt. Trivially, any individual point is small. Small sets which are not in N+ are however of limited interest. When the space is countable and the chain is irreducible and aperiodic, then every finite set is small, and in ~ + . Under certain topological continuity conditions, as shown in Section 4, every compact set is small, and such sets will certainly be in N+ if chosen large enough. Most importantly, and as a rather deep result that largely justifies our concentration on 0-irreducible chains, we can show that the concept is never vacuous. PROPOSITION 3.2. If the chain is 0-irreducible, then there is a countable cover of the whole space with small sets, and so every set in ~ + contains a small set which is also in N+.

Suppose in the autoregressive model that the noise distribution has an everywhere positive continuous density f with respect to /~Leb. If C = [-b,b], say, and if - - i n f { f ( w ) , w E [-(1 + ~)b, (1 + c~)b]}, then for A c C
836
R. L. Tweedie
P(x,A) = ~ A f ( y -- ~c)dy > @L*b(A) and so the compact set is a small set satisfying (32) with m = 1. A suitable collection of compact sets then gives the countable cover of the space with small sets that we expect. []
Example: The Hastings-Metropolis algorithm

Very similar comments apply to the Hastings-Metropolis chain. If the densities ~(x) and q(x,y) are with respect to v = ]ALeb and if they are all continuous, for example, then again every compact set is a small set in just one step. []
3.4. Recurrence and transience It is again no longer returns to any given A set A is called recurrent if Ex(r/A) = Proposition 2.1. possible in general to consider the number of times a chain state, but we can apply similar concepts to sets in N+. uniformly transient if Ex(t/A) < M < oc for all x C A, and oc for all x E A. We now extend the structural dichotomy in
PROPOSITION 3.3. When X is ~b-irreducible, either every set in ~ + is recurrent (and we call X itself recurrent), or the space has a countable cover with uniformly transient sets (and we call X transient); and in the latter case every small set is uniformly transient. In m a n y situations, as in the countable space context, we are interested in stronger recurrence properties in terms of the return time probabilities L(x,A), rather than just using Ex(qA). The set A is called Harris recurrent if L(x, A) -= 1 for all x c A. It is possible to prove that this implies the seemingly stronger result that Px(rlA = oc) = 1, x c A .
We can still indicate what recurrence or transience means in terms of Harris properties, although the dichotomy here is marginally less clean than in the countable case. This is quite typical of general spaces: we can usually find a 0-null set of points where the ideal results might not hold. PROPOSITION 3.4. I f X is 0-irreducible and recurrent, then we can write X :-HUN (33)
where H is absorbing and non-empty and every subset of H in .~+ is Harris recurrent; and N is ~p-null and transient. We call the chain Harris recurrent if the set N in (33) is empty.
837
On a general space we can also emulate the drift criteria for recurrence and transience that we described for countable spaces. PROVOSmON 3.5. Let P be the transition law of a 0-irreducible M a r k o v chain. (i) The chain is Harris recurrent if there is a non-negative function V(x),x ~ X which is " u n b o u n d e d " in the sense that {y : V(y) < n} is small for every n, and a small set C such that
f P(x, dy) V(y) < V(x),
~_C
(34)
(ii) The chain is transient if and only if there is a bounded non-negative function V(x) such that the sets {y : V(y) < r} and {y : V(y) > r} are in .~+ for some r, and
P(x, dy)V(y) > V(x),
x ~ { y : V(y) < r} .
(35)
The choice of suitable drift functions to satisfy (34) or (35) depends on the structure of the chain.
3.5. Invariant measures

On general spaces we again further classify chains using invariant measures. We now say that a measure ~ on N is invariant if it has the property that
~z(A)
f rc(dy)P(y,A),
Jx
A E~ .
(36)
THEOREM 3.6. If the chain X is 0-irreducible and recurrent then it admits a unique (up to constant multiples) invariant measure 7c, which is also a maximal irreducibility measure for X. The invariant measure 7c is finite if there exists a small set C such that sup
x~C
E~[~c] <
oc ,
(37)
in which case there is a countable collection of small sets Cj satisfying (37) such that
x=Uc, uN
J
where 7r(N) = 0. A ~,-irreducible chain for which an invariant probability measure exists is called positive (recurrent); otherwise it is called null. Transient chains are always null, as is clear from (37).
838
R. L. Tweedie
Example: The Metropolis-Hastings algorithms

For every x, y, the construction of the Metropolis-Hastings algorithm ensures that
~(x)p(x, y) = ~V)pCv, x)
(38)
as in (14). Now integrating this detailed balance condition against the reference measure v, we see that 7cis invariant for the algorithm, regardless of the candidate q(x,y) chosen. It is this rather remarkable fact that lies behind the flexibility that this algorithm provides, since one can choose q(x,y) to suit the purposes of the problem and to optimise other features of the algorithm. Again, in this context there is the possibility of using reversibility arguments based on (38) as in Kelly (1979) to analyse the general state space algorithms, although these possibilities have yet to be widely exploited. [] The Metropolis-Hastings chain is unusual in that we have an expression for its invariant measure in advance, and therefore if it is irreducible it must be positive. In other areas such as the study of queues or autoregressive processes, it may be difficult to decide if a chain is null or positive. General state space chains again allow drift conditions for checking positive recurrence. PROPOSITION 3.7. Let P be the transition law of a 0-irreducible Markov chain on X. The chain is positive if and only if there is a non-negative function V(x), finite almost everywhere, and e > 0, b < oo such that for some fixed small set C
f P(x, dy)V(y) < V(x) - e + b~c(x),
x EX .
(39)

Suppose that the noise process has zero mean and finite variance, so that E[ej] = 0 and E[e2] < oo. If we choose V(x) = x 2 we have
fP(x,
dy)vCv) =
+ <]2 =
2v(x) +
(40)
so that (39) holds when C = I-c, c] for some large enough c, provided that the parameter e < 1. Of course we still need to know that C is a small set for (39) to have the desired consequence. As noted above, in this example compact sets are small if the noise process has everywhere positive densities, although this already quite weak condition can clearly be weakened further. Notice that we have made no demands on the actual distribution of ej. In general, this approach then provides an existence result rather than helping with calculation of rr. If the ej are N(O, a 2) variables, then one can check the invariant equations and find that, provided ~ < l, if ~z ~ N ( 0 , a 2 / ( 1 - ~2)) then ~ is invariant. []
839
3.6. C o n v e r g e n c e
to 7c
As in the countable space case, in order to get convergence of the n-step transition laws to ~z, we need to avoid periodic behaviour. A slightly non-standard definition of aperiodicity for a 0-irreducible chain is to require that for some one small set C in N+, we have Pro(x, C) > 0 for all sufficiently large m. This is equivalent to other definitions in this context. It is also obviously implied by the assumption of strong aperiodicity, namely that there exists a set C, a measure qo and a constant 6 such that P(x,B)>34o(B), xcC, BEB : (41)
that is, C is a one-step small set. We will not go into further details of periodicity here, but we do note that strong aperiodicity is the analogue of finding a single state k such that P ( k , k) > 0. On a general space we now consider again convergence of transition probabilities in total variation. For any n we define the total variation n o r m between the measures Pn(x, .) and 7z by
IIP"(x, .) - <1 =
supJ P"(x, dy)g(y) -/.(dy)g(y)

2 suplP"(x,A)
A
- ,cCA)l .
The exact analogue of the countable space result holds. PROPOSITION 3.8. For a 0-irreducible aperiodic positive chain with invariant measure To, IIPn(x, .) - 7cll --+ O, n --+ oc (42)
for ~-almost any x E . I f (39) holds for an everywhere finite function V then (42) holds for all x c . This result uses not only coupling techniques, but also a method of embedding renewal times on the occasions that the chain visits any small set C. This approach was developed independently by Nummelin (1978) and Athreya and Ney (1978), and provided a revolutionary insight into the reasons why general state space chains provide results almost completely analogous to those on countable spaces. It is simple to describe how this method works for a strongly aperiodic chain where (41) holds. We construct an auxiliary process of coin-tosses with probability of success being 6. When we get a successful toss, we use the measure q~ for the next position. If we do not have a success we use the residual transition law R(x, .) = [P(x, .) - &p(-)]/[1 - 6] . It is simple to check that the marginal transition law ignoring the coin-tossing is just P ( x , - ) ; but clearly the law is independent of the last position x E C, so that regeneration occurs on those occasions when there is a successful toss.
840
R. L. Tweedie
This method immediately provides powerful renewal-theoretic tools for studying chains on general spaces where, on the face of it, no regeneration takes place. As one application, we can prove convergence of the moments of the n-step transition laws using the f - n o r m UP"(x, ") - ~ I l f = sup f P~(x, d y ) g ( y ) - / ~ ( d y ) 9 ( y ) 101J
which generalises the total variation norm to unbounded non-negative functions f . PROPOSmON 3.9. For a ~,-irreducible aperiodic positive chain with invariant measure ~z, and any non-negative function f such that ff(x)~(dx) < oo, IIPn(x, ') - ~[If --+ 0, for ~z-almost any x E X. There is again a drift condition that enables us to check finiteness of n --, oc , (43)
ff(x)rc(dx), and this also leads to a condition ensuring that (43) holds everywhere. PROPOSITION 3.10. Let P be the transition law of a 0-irreducible aperiodic Markov chain on X. The chain is positive recurrent with an invariant probability measure re satisfying ff(x)rr(cbc) < oo for a non-negative function f if and only if there is a non-negative function V(x), finite z-almost everywhere, and a small set C such that
f P(x, dy)V(y) <_ V(x) - f(x) + b~c(x);
x CX .
(44)
The convergence in (43) holds for every x such that V(x) < oc.
3.7. Geometric ergodicity and rates of convergence

As on a countable space, virtually any degree of geometric convergence leads to an overall uniform geometric rate. PROPOSITION 3.11. Suppose that, for a 0-irreducible aperiodic positive chain, there exists a small set C c ~ + and constants Mc < oo, Pc < 1 such that ISn (x, C) - ~(C) I < Mcp~ for all n and all x C C. Then there exists p < 1 and Mx < oc for K-almost any x E X such that
IIPn(x,.)-~c H <_Mxp",
n_> 1 .
(45)
We call the chain geometrically ergodic if (45) holds. The characterisation of geometric ergodicity using exponential moments of return times continues to hold in this general context in the following manner.
841
PROPOSITION3.12. A Ip-irreducible aperiodic chain is geometrically ergodic if and only if there is a small set C such that
sup Ex[fi~cl < oc
x~C
(46)
for some fi > 1. There is inevitably a drift condition that also characterises geometric ergodicity.
PROPOSITION3.13. A 0-irreducible aperiodic chain is geometrically ergodic if and only if we can find a function V(x) > 1, finite almost everywhere, a small set C and 2 < 1, b < oc such that
f P(x, dy)V(y) < 2V(x) + b~ c(x) .

The geometric convergence can then be strengthened to
(47)
IIPn(x,') - ~llv < M/VCx),
n> 1
(48)
for some M < oc, p < 1, which implies in particular that convergence in (45) occurs for every x for which V(x) is finite. Thus, as in the countable case, from the drift condition (47) we find that the "moments of order up to V" converge at a uniform geometric rate, and identify the constant depending on the starting point in terms of V.

In (40) we have in fact verified that for the autoregressive model, V(x)= x 2 satisfies (47) rather than just (39), and so the chain is geometrically ergodic. Because of the conclusions of Proposition 3.12, we can deduce a number of things: the stationary measure has a finite variance so the chain is second-order stationary; the n-step means and variances converge to these stationary forms, at a uniform geometric rate; and convergence holds from every starting point. []

Geometric ergodicity of the Metropolis-Hastings algorithm on Rk is largely a property of the tails of ~ when the candidate density q(x,y) is a random walk. Conditions for geometric ergodicity can be shown to be essentially that the tails of are geometric or lighter, and that in higher dimensions the contours of ~ are smooth near oc (Mengersen and Tweedie, 1996; Roberts and Tweedie, 1996). To show this, in one dimension it is enough to use the test function V(x) = e ~1xl in (47). In higher dimensions it seems to be effective to use the test function V(x) = [Tc(x)] p for some choice of ft. This latter choice of a "tailored" test
842
R. L. Tweedie
function appears to be a good and s o m e w h a t intuitive choice in other circumstances also. []
3.8. Rates of convergence and uniform ergodicity

In the general state space case the determination of a b o u n d on the actual rate of convergence is m o r e difficult again than in the countable space case. There is one situation where the rates o f convergence are c o m p u t a b l e without a great deal of extra work. This is the i m p o r t a n t special case of uniform ergodicity, where (48) holds with the b o u n d independent of x. Such chains have been studied for over 50 years for countable and general spaces, and yet the cleanest results a b o u t their evolution are rather new. PROPOSITION 3.14. F o r any 0-irreducible aperiodic M a r k o v chain X the following are equivalent: (i) X is uniformly ergodic, in the sense that for some r(n) -~ O, and all x E X IIPn(x, ") - ~lr -< r(n) . (ii) There exists r > 1 and R < ec such that for all x E X lIP"(x, ") - ~ll -< e r " ; (50) (49)
that is, the convergence in (49) takes place at a uniform geometric rate. (iii) Doeblin's Condition holds: that is, there is a probability measure ~b on ~ and e < 1,~ > 0, m E Z+ such that whenever ~b(A) > e inf pm(x,A) > ~ .
xEX
(51)
(iv) The state space X is small, so that for some m
Pro(X,.) > 6q0('),
XCX
(52)
for some probability measure cp and some 0 < c~ < 1. (v) There is a small set C with sup Ex[zc] < oo
xEX
in which case for every set A e -~+, supxEx Ex[zA] < oc. (vi) There is a small set C and a ~c > 1 with
sup xEX
E~[~~]
< oc
in which case for every A c ~ + we have for some ~cA > 1, s u p ~ x Ex[~C] A] < oc. (vii) There is a b o u n d e d solution V > 1 to
f P(x, dy)V(x) < 2V(x) +b~c(x),

for some small set C.
x ~
(53)

Under (iv), we have in particular that for any x,
843
IlPn(x, ) - ~ll -< 211

where 6 < 1 is as in (52).
a]n/m
(54)
Thus in (54) we have a very simple bound on the rate of convergence.

Uniform ergodicity essentially never holds for the Metropolis-Hastings algorithm on R k when the candidate density q(x,y) is a random walk (Mengersen and Tweedie, 1996). It is not hard to show that, in effect, the further away from any compact small set, the longer it takes to reach it, and this violates (v) above. However, if the candidate is "independent" in the sense that q(x,y) = q(y) for all x, then the necessary and sufficient condition for uniform ergodicity (Mengersen and Tweedie, 1996) is that
>_ a > o, - a.e x.
In this case we have that (52) holds, since p(x,y) > fire(y), and so we have the bound (54) on convergence of this chain. [] The case where the chain is stochastically monotone is also one where computation is possible. If there is a total order on X, then one can define stochastic monotonicity by requiring that for any y
x >z~
P(x, {w > y}) > P(z, {w >_y}) .
In this case we get the following result, essentially identical to Proposition 2.13. PROPOSITION 3.15. Suppose we have a 0-irreducible aperiodic stochastically monotone chain, and {0} is the minimal element in the ordering. Then if V satisfies (47) with C = {0}, we have for M+ < oc
IIP~(x, ") - ~[I -< M S
For more general chains satisfying (47), there are again rather more complicated bounds on p. In this case, if C is the small set in (47) and also satisfies (41), these bounds depend only on )~, b and the value of 6 in (41). The best known results, which are tight enough to give reasonable numerical values in many M C M C areas, are due to Rosenthal (1995) and Roberts and Tweedie (1999).
4. Topological spaces
4.1. Strongly continuous components
The structure of Markov chains described above is essentially probabilistic. The properties of Markov chains such as recurrence or positivity are defined in terms
844
R. L. Tweedie
of the return times to sets of positive 0-measure, or as finite mean return times to small sets, and so forth. Yet for m a n y chains, there is more structure than simply a a-field and a probability kernel available, and the expectation is that any topological structure of the space will play a strong role in defining the behaviour of the chain. In particular, we are used to thinking of specific classes of sets in Nn as having intuitively reasonable properties. When there is a topology, compact sets are thought of in some sense as having the same sort of properties as a finite set on a countable space; and so we could well expect compact sets to have the sort of characteristics we have identified for small sets. On the other hand, open sets are non-negligible in some sense, and if the chain is irreducible we might expect it at least to visit all open sets with positive probability. This indeed forms one alternative definition of "irreducibility". Based on these considerations, we introduce topological conditions which will help us to link the concept of 0-irreducibility with that of open set irreducibility, and to identify compact sets as small. Assume that X is equipped with a locally compact, separable, metrisable topology with ~ as the Borel a-field. Recall that a function h from to ~ is lower semicontinuous if liminfh~v) _> h(x),
y--+X
x EX :
a typical, and frequently used, lower semicontinuous function is the indicator function Go(x) of an open set O in ~ . The following continuity properties of the transition kernel, couched in terms of lower semi-continuous functions, define classes of chains which we will see have suitable topological properties. (i) I f P(., O) is a lower semicontinuous function for any open set O E ~ , then P is called a (weak) Feller chain. (ii) I f P(.,A) is a lower semicontinuous function for any set A E N, then P is called a strong Feller chain. (iii) If there is a distribution an on Z+ and a (substochastic) transition kernel T satisfying
~
n
anPn(x,A) > T(x,A),
x E X, A E N ,
where T(.,A) is a lower semicontinuous function for any A c N, then T is called a continuous component of X. I f X is a M a r k o v chain which possesses a continuous component T such that T(x, ) > 0 for all x, then X is called a T-chain. Obviously T-chains are extensions of strong Feller chains, but rather than requiring continuity at all sets we merely require through the kernel T that there is some partial contribution of strong continuity. Perhaps surprisingly, this turns out to be almost exactly the appropriate condition to link topological and probabilistic or measure-theoretic properties of M a r k o v chains.
845
Example: Random walk

The Feller and T-chain properties are not hard to check for many models. However, they can be entirely and cleanly characterised for the random walk, whose transition law is given by P ( x , A ) = F ( A - x) where F is a measure on Rk. It is simple to show that the random walk is always weak Feller: for any bounded continuous f we have
f P(x,dy)f(y) = ./ f(dy- x)f(y) = f C(dy)f(y + x)
(55)
which is continuous by the dominated convergence theorem. This is equivalent to the weak Feller condition. In contrast, the properties of convolutions such as (55) show that random walk is strong Feller if and only if the increment distribution F is absolutely continuous (has a density) with respect to/A Leb. For the T-chain property to hold, we need much less than for the strong Feller property: a random walk is a T-chain if and only if there is some convolution power of F which is non-singular with respect to }tLeb. Hence we can allow a considerable amount of singularity, rather than requiring absolute continuity. This non-singularity condition appeared in the stochastic processes literature at a very early stage, and random walks satisfying this condition are called "spread-out". It was introduced by Smith (1954; 1955) and used later by Stone (1966) to prove renewal theorems without direct Riemann integrability conditions. Markov chain methods are used in Arjas et al. (1978) to give stronger forms of these theorems. It is intriguing that the T-chain property surfaces as the exact analogy of the spread-out condition, for chains without random walk structure. [] Given the properties that flow from the T-chain condition, as described below, it is worth having simpler conditions that might enable it to be checked relatively easily. The weak Feller property is very easy to establish in many cases, such as autoregressions or queuing models, and for certain classes of irreducible chains, we find the weak Feller property itself implies the existence of a continuous component, although the chain is not strong Feller. PROPOSmON 4.1. If X is a 0-irreducible Feller chain such that the support of ~, has non-empty interior, then X is a T-chain. We might perhaps hope that any (p-irreducible Feller chain is a T-chain. This is not the case, as is shown by the following counter example. Let X = [0, 1] with the usual topology, let 0 < le] < 1, and define the Markov transition function P for x>0by
p(x, {0}) = 1 - P(x, { ~ } ) =
Set P(0, {0}) = 1, so that the transition function P is Feller and (50-irreducible. But for any n E 2+ we have lim~_~0 Px(r{0) >_ n) = 1, from which it follows that
846
R. L. Tweedie
there does not exist an open small set containing the point {0}. Thus we have a (p-irreducible Feller chain on a compact state space which is not a T-chain.
4.2. Properties of T-chains

Our first result links open-set irreducibility with (p-irreducibility. The concepts of open set irreducibility involve "reachable states". A state x* c X is reachable if
o) > 0
n
for every state y c X, and every open set 0 containing x*. If L(x, O) > 0 for all x and all open sets O c N, so that every point is reachable, then the chain is called "open-set irreducible". PROPOSmON 4.2. I f X is a T-chain, and X contains one reachable point x*, then X is (p-irreducible, with (p = T(x*, .). Consequently, if X is an open-set irreducible T-chain, then X is 0-irreducible and 0 ( 0 ) > 0 for all open sets O.
Example: The random walk

To see that this is not the situation when the chain is simply weak Feller, consider the random walk where the increment distribution F is concentrated, on the rationals 2, and is positive on every rational q c 2. F r o m each starting point, because the rational translates ~ - x are dense, the chain is open set irrreducible. But clearly ifx and y are such that ~ - x and ~ - y are disjoint, which happens for uncountably m a n y x,y, then the chains from these starting points are supported on disjoint sets and there is no measure (p such that the r a n d o m walk is (p-irreducible. [] We now see how compact sets behave in the probabilistic context. The T-chain condition is almost exactly adapted to provide the result we would like to have. PROPOSITION 4.3. I f every compact set is small then X is a T-chain; and conversely, if X is a 0-irreducible aperiodic T-chain then every compact set is small. The value of this result is great. As we saw in most of the results above, we need to identify properties of the chain by identifying properties of small sets. It is invaluable to be able to replace "small set" by "compact set" in all of these results. T-chains are also appropriate for linking probabilistic and topological recurrence conditions. Topological recurrence of a point x* can be defined by requiring that ~ n P"(x*, O) = oc for every open set O containing x*. Positivity of a point is defined similarly, by requiring that lira sup Pn(x*, O) > 0 for every open set O containing x*.
847
PROPOSITION 4.4. Suppose that X is a 0-irreducible T-chain, and that x* is reachable, Then (i) X is recurrent if and only if x* is topologically recurrent. (ii) X is positive recurrent if and only if x* is positive. We also link more global probabilistic and topological ideas for T-chains. To define a form of "topological transience", let us say that a sample path of X converges to infinity (denoted X -+ oc) if the trajectory visits each compact set only finitely often. Conversely, X is called bounded in probability if for each state x E X and each e > 0, there exists a compact subset K c X such that liminfPx{X~ E K} > 1 - e .
k-+oo
Boundedness in probability is simply tightness for the collection of probabilities {Pk(x, .) : k >_ 1}, and so should be related to the existence of limiting measures for the transition laws. PROPOSITION 4.5. Suppose that X is a T-chain on a topological space and x* E X is a reachable state. (i) X is Harris recurrent if and only if Px{X --+ oc} = 0 for each x E X. (ii) X is positive Harris if and only if X is bounded in probability. Hence for T-chains the measure-theoretic ideas of recurrence and transience are exactly matched by appropriate topological concepts. It is possible to construct examples to show that this is not true under the simpler weak Feller condition, although we omit details.
4.3. Doeblin decompositions

We conclude by returning to the question of structure when the chain is not irreducible in any way. The T-chain structure then gives us surprisingly complete results, as shown in Tuominen and Tweedie (1979). PROPOSITION 4.6. Suppose that X is a T-chain on a topological space. Then there is a decomposition
cx)
x =
1
H, + t
(56)
where each Hi is an absorbing set such that the chain restricted to Hi is Harris recurrent; every topologically recurrent point is in the set ~ Hi; and the remaining set E can be covered by a countable number of uniformly transient sets. In the Doeblin decomposition in (56), we can thus identify the nature of the recurrent and the transient parts of the space in a rather explicit fashion. The results are even more striking when the space is compact.
848
R. L. Tweedie
PROPOSITION 4.7. Suppose that X is a T-chain on a compact space. Then there is finite K and a decomposition
K
X:ZHi+E
1
(57)
where e a c h / / i is an absorbing set such that the chain restricted to H i is positive Harris recurrent, and the remaining set E is itself a uniformly transient set. These last two propositions show that there is a complete analogy between the behaviour of T-chains and, respectively, chains on countable and finite spaces. Thus we cannot hope to improve on these results.
5. Conclusions The goal of this review has been to consider the structural properties of discrete time chains, together with a small number of rather simple examples which illustrate those properties. There are many areas we have not had space to describe. There is a further huge literature for Markov processes in continuous time (that is, for a continuously indexed set of variables, so that T = [0, ec)), which is in many ways more complex and richer than the discrete time theory, but we do not extend our scope to cover such models in this article. There are many results in quite different directions from those we have discussed that have limited counterparts in discrete time (see Ethier and Kurtz (1986), Karatzas and Shreve (1991) and many others). For every result we give above, however, an almost identical result can be shown to hold in continuous time (Meyn and Tweedie, 1993b; Meyn and Tweedie, 1993c; Tweedie, 1994), using various embedded discrete time approaches, provided the continuous process is suitably smooth in its behaviour. Even for discrete time chains, there is a great deal we have left untouched. We have not mentioned much of the work on transient chains, for example, and some details of this are in Revuz (1984). The theory of quasi-stationarity, which considers the way in which the chain might have asymptotic behaviour conditional on some event of limiting probability zero [such as the behaviour of a branching process conditional on non-extinction (Athreya and Ney, 1972)], has received no mention here. We have concentrated on the structure of general chains and ignored chains with special structure: further discussions of reversibility, widely used in network theory in particular following the work of Kelly (1979), and the use of deep group theoretic approaches as developed by Diaconis (1988) and others, would take much more space than we have. Even within our context we have not looked at results on non-geometric convergence rates (Tuominen and Tweedie, 1994), although these are receiving considerable attention in applications. Perhaps the single most glaring omission is the work on sample path properties of Markov chains, such as strong laws and central limit theorems. It is known
849
that these are closely related to the Harris recurrence condition, and they are of particular use in Markov chain Monte Carlo applications. The interested reader is referred to Meyn and Tweedie (1993a) for details. Use of path structures to simulate invariant measures in a "perfect" way, as described in the seminal paper by Propp and Wilson (1996) [with much relevant theory from the Russian literature described in Borovkov (1998)], represent exciting new directions linking Markov chain theory with computational approaches that have not been explored before. There has been no space to describe complex examples. Rather we have tried to show the flavour of some of the results that can be used by practitioners with real Markovian systems to analyse. The simple examples we have used for illustration give, in the main, a sense of the directions that are needed in more realistic situations, and the wealth of current literature provides many further guidelines in specific areas such as teletraffic theory, population modelling and the like, which are beyond the scope of this review. As stochastic modelling of complex systems becomes ever more feasible and ever more necessary, it is vital that the underlying structure of the models available should be sufficient for the modelling needs. This review shows that one can move to very general cases and preserve all the power of the simpler structures. The mathematical and probabilistic results are elegant, the clean nature of the outcomes sometimes quite surprising: but the main impression we hope to leave is that these results can actually be used in a very wide cross-section of the situations where probabilistic models are to be found today.
Acknowledgement
Work supported in part by N S F Grant DMS 9803682.
References
Aldous, D. and P. Diaconis (1986). Shufflingcards and stopping times. American Math. Monthly 93, 155-157. Aldous, D, and P. Diaconis (1987). Strong uniform times and finite random walks. Adv. Applied Maths. 8, 69-97. Arjas, E., E. Nummelin and R. L. Tweedie (1978). Uniform limit theorems for non-singular renewal and Markov renewal processes. J. Appl. Prob. 15, 11~125. Asmussen, S. (1987). Applied Probability and Queues. John Wiley & Sons, New York. Athreya, K. B. and P. Ney (1972). Branching processes. Springer-Verlag, New York. Athreya, K. B. and P. Ney (1978). A new approach to the limit theory of recurrent Markov chains. Trans. Amer. Math. Soc. 245, 493-501. Borovkov, A. (1998). Ergodicity and Stability of Stochastic Processes. John Wiley and Sons, New York. Brockwell, P. J. and R. A. Davis (1991). Time Series: Theory and Methods. 2rid edn. Springer-Verlag, New York. ~inlar, E. (1975). Introduction to Stochastic Processes. Prentice-Hall, EnglewoodCliffs, NJ.
850
R. L. Tweedie
Chen, P. and R. Tweedie (1997). Orthogonal measures and absorbing sets for Markov chains. Math. Proc. Camb. Phil. Soc. 121, 101-113. Diaconis, P. (1988). Group Representations in Probability and Statistics'. Institute of Mathematical Statistics, Hayward. Diaconis, P. (1995). The cutoff phenomenon in finite Markov chains. Proc. Natl. Acad. Sci. USA 36, 1659 1664. Ethier, S. N. and T. G. Kurtz (1986). Markov Processes. Characterization and Convergence. John Wiley & Sons, New York. Gilks, W., S. Richardson and D. Spiegelhalter (Eds.) (1996). Markov Chain Monte Carlo in Practice. Chapman and Hall, London. Karatzas, 1. and S. E. Shreve (1991). Brownian Motion and Stochastic Calculus. Springer-Verlag, New York. Kelly, F. P. (1979). Reversibility and Stochastic Networks. John Wiley & Sons, New York. Krugman, P. and M. Miller (Eds.) (1992). Exchange Rate Targets and Currency Ban&'. Cambridge University Press, Cambridge. Lindvall, T. (1992). Lectures on the Coupling Method. John Wiley & Sons, New York. Lund, R. and R. Tweedie (1996). Geometric convergence rates for stochastically ordered Markov chains. Math. Operations Res. 21, 182-194. Mengersen, K. and R. Tweedie (1996). Rates of convergence of the Hastings and Metropolis algorithms. Annals of Statistics 24, 101-121. Meyn, S. P. and R. L. Tweedie (1993a). Markov Chains and Stochastic Stability, Springer-Verlag, London. Meyn, S.P. and R. L. Tweedie (1993b). Stability of Markovian processes II: Continuous time processes and sampled chains. Adv. Appl. Prob. 25, 487-517. Meyn, S.P. and R. L. Tweedie (1993c). Stability of Markovian processes III: Foster-Lyapunov criteria for continuous time processes. Adv. Appl, Prob. 25, 518-548. Nummelin, E. (1978). A splitting technique for Harris recurrent chains. Z. Wahrscheinlichkeitstheorie und Verw. Geb. 43, 30%318. Nummelin, E. (1984). General Irreducible Markov Chains and Non-Negative Operators. Cambridge University Press, Cambridge. Orey, S. (1971). Limit Theorems for Markov Chain Transition Probabilities. Van Nostrand Reinhold, London, Propp, J. and D. Wilson (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures and Algorithms 9, 223-252. Revuz, D. (1984). Markov Chains. 2nd edn. North-Holland, Amsterdam. Robert, C. and G. Casella (1999). Monte Carlo Statistical Methods. Springer-Verlag, New York. Roberts, G. and R. Tweedie (1996). Geometric convergence and central limit theorems for multidimensional Hastings and Metropolis algorithms. Biometrika 83, 95-110. Roberts, G. and R. Tweedie (1999). Bounds on regeneration times and convergence rates for Markov chains. Stoch. Proc. Applns. 80, 211-229. Rosenthal, J. (1995). Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Amer. Star. Assoc. 90, 558-566. Smith, A. and A.E. Gelfand (1992). Bayesian statistics without tears: A sampling-resampling perspective. Amer. Stat. 46, 84~88. Smith, W. L. (1954). Asymptotic renewal theorems. Proc. Roy. Soc. Edinburgh (A) 64, 948. Smith, W. L. (1955). Regenerative stochastic processes. Proc. Roy. Soc. London (A) 232, 6-31. Stone, C. R. (1966). On absolutely continuous components and renewal theory. Ann. Math. Stat. 37, 271-275. Taylor, H. M. and S. Karlin (1994). An Introduction to Stochastic Modeling, revised edn. Academic Press, New York. Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion), Ann. Stat. 22, 1701-1762.
851
Tong, H. (1990). Non-linear Time Series: A Dynamical System Approach. Oxford University Press, Oxford. Tuominen, P. and R. L. Tweedie (1994). Subgeometric rates of convergence of f-ergodic Markov chains. Adv. Appl. Prob. 26, 775-798. Tuominen, P. and R. L. Tweedie (1979). Markov chains with continuous components. Proc. London Math. Soc. (3) 38, 89-114. Tweedie, R.L. (1994). Topological conditions enabling use of Harris methods in discrete and continuous time. Acta Applic. Math. 34, 175 188.
24
Diffusion Processes
S. R. S. V a r a d h a n
1. Brownian motion
Brownian m o t i o n is the prototypical example of a diffusion process. The standard B r o w n i a n m o t i o n on R is a stochastic process x(t) defined for 0 _< t < oc. It has independent increments, i.e. the increments over disjoint intervals are stochastically independent. The distribution of the increment x(t) - x(s), over the interval [s,t I with s _< t, is normally distributed with m e a n 0 and variance t - s. The Brownian m o t i o n is usually normalized so that x(0) = 0 with probability 1. It can be equivalently defined as a G a u s s i a n Process with m e a n 0 and covariance E[x(s)x(t)l = s A t. The joint distribution of x(h),...,x(t~) at n time points 0 < tl < . . . < t~ < oc is multivariate n o r m a l with m e a n 0 and covariance Ci,j = ti /~ tj. Its distribution on R n has the joint density
pn(tl,... ,tn;xl,... ,x~) =
[II(
U=I
2~z(tj - tj_l)) -
j=l 2([j -- [j l ) J
with respect to the Lebesgue measure. Here we a d o p t the convention that to = x0 = 0. This defines a consistent family of finite dimensional distributions, and by K o l m o g o r o v ' s t h e o r e m defines a stochastic process. It is a t h e o r e m N o r b e r t Wiener that this stochastic process can be supported on the space = C[0, ec) of continuous functions on [0, ~ ) . The space Q comes with a natural a-field of Borel sets ~ and canonical sub-a-fields ~ t of events generated by Ix(s) :0 < s < t]. Clearly ~-s c ~ t for s <_ t and ~ = a(Ut~t), i.e. the smallest a-field containing ~ t for all t. The Wiener measure P0 on ~2 is the unique measure such that for every positive integer n, time points 0 < tl < " - < tn
Po{x(q),..., x(tn) E A] =- fA pn(tl,..., tn;Xl,... , x , ) d X l . . . dxn

for all Borel sets A c R ~. It has the following additional properties.
853
854
S. R. S. Varadhan
1. If we denote by H~,c,rl,r2 the set of functions x(.) that satisfy a local H61der condition of the form
Ix(~) - x(t)l _< c l t -
~1~
for T~ < s < t < T2 then for every ~ > 1/2, T1 < T2 and C < oo
P0[H~,c,~,,~2] = 0 .
2. On the other hand for every ~ < 1/2, and T1 < T2 < oc, lim P0[H~cr~ r2] = 1 .
C._+o o ~ ~
3. If Dt is the set of paths that are differentiable at the point t, then

P0[u,D~] =
4. For any positive integer k, if {j/2k;j > 0} is the discrete set of points with spacing 1/2 k, then --')t )P0 [kim --'~1 < ~ J~t]( x ( J2k -ix- 1 ) ~ 2= t for a l l t _ > 0 ] = 1 (1.2)
It is clear from these properties that a typical Brownian path, being nowhere differentiable, is not of bounded variation in any finite interval. However it has a definite 'quadratic variation' in any interval that is almost surely equal to the length of the interval. One of the ways of understanding the behavior of a Brownian path is to start with a random walk {S, : n _> 1} which, for each n, is the sum
&=xl+&+...+x.
of the first n terms of a sequence {X/} of independent identically distributed random variables that take the value :kl with probability 1/2 each. While the central limit theorem asserts that for large n, the distribution S,,/v~ is asymptotically the standard normal, Brownian motion is to be thought of as the limit in distribution of S[,t]/x/~ as a function of t. The Wiener measure P0 or the Brownian motion x(.) is not only the prime example of a diffusion process but it is also the building block out of which other diffusion processes are constructed. A discussion of Brownian motion and its properties can be found, among other places, in the following books (Stroock, 1993; Revuz and Yor, 1999). Usually one refers to the measure P0 as the Wiener measure and a random path x(.) that is 'distributed' according to the Wiener measure as Brownian motion. Brownian motion derives its name from the botanist Robert Brown who observed in 1828, that, in water, pollen from plants dispersed in an irregular swarming motion. From that time many scientists have examined the Brownian phenomenon including Einstein. Norbert Wiener was the first one who essentially constructed the measure corresponding to Brownian motion, and proved the
Diffusion processes
855
H61der continuity of it paths in Wiener (1923) and Wiener (1924). It was the first rigorous example of integration in an infinite dimensional function space. Just as any multivariate normal random vector can be generated by a linear transformation y = Tx from one with covariance equal to the Identity matrix, one can start with any orthonormal basis {fj(.)} in L2[0, oc] and represent the Brownian motion as
x(t) =
J
zj (t)
where {Zj} are independent standard normal random variables and
Fj(t) :
t(
s)ds .
While formally the expansion is always valid its convergence is far more delicate. Such expansions have been considered for Brownian motion and more generally for Gaussian processes in It6 and Nisio (1968).
2. Brownian motion as a Markov process
The probability density Pn in (1.1) can be rewritten as

n
pn(tl,... ,tn;Xl,... ,xn) = H p ( t j
j=l
- tj-l,Xj - xj-1)
(2.1)
where
p(t,X) =
--~7 1 exp [-x2 ~1
(2.2)
In other words the process P0 is a Markov process with transition probability P0[x(t) E Alx(a): 0 < cr < s] = f A p ( t - s , y - x ( s ) ) d y
= fA q(S,X(S), t,y)dy
where q(s,x; t,y) = p(t - s,y - x). We can in fact initialize the Brownian motion to start from any point x0 at time 0 to get Pxo which is just the distribution of x0 + x(.) under P0. The transition probability density as we saw depends only on the differences y - x and t - s. It is because Brownian motion as a Markov process is homogeneous in time and has independent increments, i.e. is it is invariant under translations in space as well as time. In the theory of Markov processes the concept of stopping times is extremely important. These are random variables z(co) > 0 that are ~ measurable that have the additional property that for every t _> 0, the set [co : r(co) _< t] is -~t measur-
856
S. R. S. Varadhan
able. Examples of stopping times are the first time some thing happens, for instance r(co) = inf[t : x(t)
>_ a]
In order to determine if r(co) _< t, we only need to observe the path up to time t. On the other hand, the last time some thing happens like
sup[t : /tV(x(s))ds
<_ O]
are not stopping times. An important fact is that for Brownian motion as well as for most other diffusion processes, the Markov property extends to stopping times and this is called the strong Markov property and was first considered by Hunt (1956). Associated with any stopping time r there is a natural a-field ffz defined by
Y~={A:An[z<_t]EYt
for all t > O }
Roughly speaking if we observe the process only up to time z and then stop, then ~ , represents the information that is available. The strong Markov property for Brownian motion, first established by G. A. Hunt [H], states that with respect to any Px0, the future increments y~(t) = x(~ + t) - x(z) is again a Brownian motion, i.e. distributed according to P0 and more over is stochastically independent of the events in Y~. This can be restated in the form Px0[x(z + t) E AIY~ ] =
P~(~)[x(t) E A]
An easy consequence of this is the reflection principle of Bachelier that computes for g_> 0,
L0<~<t
exp /
2t]
3. Semigroups and generators

An important functional analytical tool in the study of Markov processes is the notion of semigroups associated with them. For any bounded measurable function f on R, we define Ttf by
(Ttf) (x) = E[f(x(t)Ix(O) = x] = JR f(y)p(t,y - x)dy

The Chapman-Kolmogorov equations which in our case reduce to
.~ p(t, y -- x)p(s, Z - y)dy = p(t
S~ Z
X)
Diffusion processes
857
yield
TtTsf = T~Ttf = Tt+sf
for all f and s, t > 0. Such semigroups are characterized by their infinitesimal generators A dTt dt t=0 and can be recovered in some sense as Tt = exp[tA] For our Brownian motion semigroup it is an easy calculation to derive that for any smooth function f
( A f ) ( x ) = lim -1 [ [ f ( x + y) - f ( x ) ] p ( t , y ) d y - 2 t-*O t J
1 d2f(x)
dx 2
It is therefore natural to expect connections between the operator 1 / 2 D 2 and Brownian motion. The first such relation is that the transition probability p(t, x) satisfies the heat equation
ap la2p
5t 2 5x 2
and consequently
(T,f)(x) = .(,,x) =
fm(y)p(,,y-
x)dy
solves the Cauchy problem ~u 1 82u
8t - 2 ~x2 with the initial condition lim,+0 u(t, x) = f ( x ) . Taking Laplace transforms in t, for 2 > 0, the resolvent equations
,Zu - ~xx - - f
are solved by
u(x) = e-~'t(Ttf)(x)dt = E p" e-Xtf(x(t))dt
The F e y n m a n - K a c formula provides a representation for solutions of

~u 1 ~2u
8t - 2 5x 2
+- V(x)u
858
S. R. S. Varadhan
with the initial condition limt_~0 u(t,x)
= f(x)
in the form
u(t,x) = E Px [f (x( t) ) exp [foot V(x(s) )dsl l
4. Stochastic integrals
x(t) is almost surely a continuous function of t we can define Y = fo f(t)x(t)dt for smooth functions f with compact support and will give us a Gaussian random variable with mean 0 and variance fff(s)f(t)s/~ t ds dt. In fact, i f f is a smooth function we can even define ff(t)dx(t) as -ff'(t)x(t)dt after
Since an integration by part and calculate
f/
f(t)dx(t)
f'(s)f'(t)s
A t ds dt =
If(t)] 2 dt
By completion in L2[P], we can now define ff(t)dx(t) for functions f in L2[0, cc] and will give us a mean zero normal random variable with variance [[f[]~. This was carried out already by Wiener and yields the following result known as the Cameron Martin formula [see Cameron and Martin (1953)]. Let F(t) be of the form F(t) = fof(s)ds for some f in L2[0, oc]. Let x(.) be Brownian Motion distributed according to the Wiener measure P0. Then the distribution PF of y(.) = x(.) + F(.) is absolutely continuous with respect to P0 and has the Radon-Nikodym derivative d--~PpF (co) = exp If0 o~f(t)dx(t)-~
1 .)0 o~if(t)]2 dt]
Formally,the relative density of a Gaussian vector { with mean # and covariance C with respect the Gaussian with mean 0 and the same covariance is given by exp[{{, C-1#)
-
1{#, C-,#} l
and our formula is just a special case. In the definition of Wiener's stochastic integral the integrand f is nonrandom, i.e. does depend on the path co. It6 extended Wiener's definition in a far reaching way. Since the calculation is essentially one involving two moments, in an approximation of the form
S = Zf(tj)(x(tj+i
J
) -- x(tj))
if we let f(t) = f(t, co) depend on co, but insist that for every t, f(t, co) be measurable with respect to g t then the increment x(tj+l)- x(tj) is independent of f(tj, co). One still gets
Diffusion processes
F__ 7
859
/--
leading to It6's definition of the stochastic integral for the class of 'progressively measurable' functions f(t, co) satisfying t, co :dt <oc
Since we do not necessarily have to integrate all the way to ec, it suffices that f is progressively measurable and satisfies E s, co)lZds < o c for all t < e c
It6's stochastic integral
y(t) :
f(s, co)dx(s)
is well defined under these conditions and has the following properties. (1) The process y(t) is almost surely continuous in t and is a progressively measurable martingale relative to fit. (2) If ~ is any stopping time, y(~ A t) is also a martingale and
y(~ A t) =
/0
{l~>s(co)f(s, co)]dx(s) =
/0
f~(s, co)dx(s) .
(3) The process z(t) = yZ(t) - J ~ ( s , co)12ds is a progressively measurable martingale. (4) If f is in addition uniformly bounded, then exp kY(t) - 5 is again a martingale.
5. It6 calculus
t The stochastic integral y(t) = fof(s, co)dx(s) can be thought of as dy = f d x and the first step in developing any calculus is the chain rule. What is d~b(y(t))? The problem exists already if f = 1 i.e. what is d~b(x(t))? If we take ~b(x) = x 2 one might think that dxZ(t) = 2x(t)dx(t) or
x2(t) - x2(O) = x2(t) = f0 t 2x(t)dx(t)
860
s. R. S. Varadhan
This is clearly wrong because the left hand side has expectation t while the right hand side has expectation 0. A better guess is
x2(t) - x2(0) = x2(t) =
/0'
2x(t)dx(t) + t
This can indeed be seen to be correct by using (1.2). More generally It6's formula asserts that
f(x(t)) - f(x(O)) =
i0'
f'(x(s))dx(s) + ~
fn(x(s))ds
Since x(-) is not of bounded variation but has finite quadratic variation, formally (dx) 2 = dt and (dx)i(dt) j = 0 if i + 2j _> 3. We have to take an additional term in the Taylor expansion. We can now compute
1 9 dO(t,y(t)) = Otdt + ~ydy + 1~byy(dy)= [qSt+ 0yyf2] dt + ~yf dx or
,b(t,y(t)) - +(O,y(O))
/0'
t 1
+
In particular if f = 1
O(t,x(t)) - q$(O,x(O))= io [~(s,x(s)) +~dPxx(S,X(S)))ds
/o'
One can use It6's formula to establish a connection with the infinitesimal generator 1/2D~. For example if u(t,x) solves
Ht z
D~u
with u(O,x) = f ( x ) , It6's formula for v(t,x) = u(T - t,x) yields
dv(t,x(t)) =
//
v~(s,x(s))dx(s)
with no dt term and therefore v(t,x(t)) is a martingale. Equating expectations at times 0 and T we get
v(O,x) -- u(r,x) = EP~{v(T,x(T))] = Eex[f(x(r))] = f f(y)p(T,y - x ) d y d
Diffusionprocesses
861
It6's formula in particular provides a p r o o f of the uniqueness of the Cauchy problem for the heat equation! A more interesting example is to consider a smooth function u = u(t,x) that solves
ut -[- ~Dxul 2 -L g(x, t)u = 0 in [0, T] x [a, b]

with u(T, x) = f(x). Then by a similar calculation
u(t,x(t)) exp
k 0
V(s,x(s))ds
.1
is a martingale until the exit time ~ from the interval [a, b], and we get u(0, x) : E Px[u(z A T, x(z A T))] exp
[/0
V(s, x(s))ds
expressing u(O,x) in terms of the boundary values of u along t = T and 0 < t < T,x = a or b. It~5's theory of stochastic integrals and stochastic differential equations appeared in [It6 (1942)]. We can now find a treatment in various texts in probability, engineering and in recent times finance.
6. Brownian motion with a drift
Brownian motion is characterized as a process with independent increments with the distribution of the increment x ( t ) - x(s) being normal with mean 0 and variance t - s. We can build in a drift by requiring the increment to have a normal distribution with mean # ( t - s) and variance t - s. Such a process is very easily defined as y(t) = x(t) + gt. It is again a M a r k o v process with a transition probability density
1 [ (y-x-#(t-s))21 q~(s,x, t,y) - v/2~z( t _ s) exp ~2~--_s i

Symbolically dy = dx + pdt and there are analogous results. But the basic difference is that the infinitesimal generator 1/2D 2 has now to be replaced by
1/2D 2 + #Dx.
Often, in considering stochastic models, we come across situations where the drift is a restoring force that is proportional to the displacement. Formally dy = - # y dt + dx . (6.1)
Such an example will not be a process with independent increments, but will still be Markov. We can solve (6.1) explicitly with initial condition y(0) = y, to obtain
y(t) = e-'ty(O) +
/0 '
e -(t-s) dx(s)
(6.2)
862
S. R. S. Varadhan
From the representation (6.2), we see that given y(O) = y, y(t) is again normally distributed, but now with mean e-~ty and variance 1/2#(1 -e-2~t). We therefore have a time homogeneous Markov process with transition probability density 1 exp[ #(Y - e #(t-~)x)2-] (1 e 2v(t-s)) J
q(s,x,t,y) = ~/~(1 - e -2v(t *))
As t ~ oc this has a limit which is a stationary density for the process and is in fact the normal distribution with mean 0 and variance 1/2#. For this Markov process, the generator is 1/2D 2 -#yDy. The way to compute the generator quickly is by applying It6's formula du(y(t)) = u'dy(t) +
u"(dy(t)) 2 = u'[-#ydtJ + u dx + 5u dt
1 ,
=[~D2u-ktyDyu]dt+Dyudx
The generator is given by the coefficient of the dt term. More generally we can have models of the form dy = b(y)dt + dx, written as
y(t) = y(O) + x(t) +
/0'
b(y(s))ds
If b(y) satisfies a Lipschitz condition in y, i.e., a condition of the form [b(y) - b ( z ) [ _< C J y - z] which is valid for some constant C for all y and z, the equation can be solved uniquely by Picard iteration. The solution y(t) will be a function y(t) = ~(t, y(O), co) that will be a measurable function of co relative to the o--field Yr. The autonomous nature of the equation means that
co) = - s,y(s), 0,co)
where 0,co is the new Brownian path x(~r) - x(s) for a > s. This guarantees that y(t) is still a Markov process and It6's formula computes the generator as 1/2D 2 + b(y)Dy. The transition probability is no longer explicit. It is of course given by
q(s,x, t,A) = PofO(t - s,x, co) C A]

Hopefully this has a density. The conditional distribution of the increment
z = y(t + h) - y(t) of a Brownian motion with drift b(y) in an interval It, t + hi is roughly normal with mean hb(y(t)) and variance h. The relative density with
respect to the distribution of the increment of Brownian motion over the same interval is given by
exp [b(y(t) )z - ~ [b(y(t) )]21
Diffusion processes
This suggests the formula t 1 t ds 1 e x p l f 0 b(x(s))dx(s)-~ fo [b(x(s))]2
A
863
for the R a d o n - N i k o d y m derivative (dPb(.),x)/dPx of the Brownian motion with drift b(.) relative to Brownian motion starting at the same point x, but without any drift. This formula, known some times as Girsanov's formula, can be proved again using It6's formula, at least when b is bounded. It can become quite technical if b is unbounded for reasons that have to do with the possible explosion of the process, i.e. becoming infinite at a finite time. The formula in a sense is still true, but needs to be carefully interpreted. See Stroock and Varadhan (1997) for a general discussion on explosion.
7. One-dimensional diffusions The idea of a one dimensional diffusion process is to have a M a r k o v process x(t) with continuous paths such that given the past history Ix(s) : 0 < s < t ], the future increment x(t + h) - x(t) has a conditional distribution that for small h is approximately normal with mean hb and variance ha. The mean b and the variance a can be functions b(t,x(t)) and a(t,x(t)) respectively. The case b = 0, a ~ 1 is clearly Brownian motion. If b(t, x) - I~ and a - 1 we have the Brownian motion with a constant drift. I f a and b are purely functions of t we have a Gaussian process with independent increments, with the increment over Is, t] being normally t distributed with mean fj b(o-)do- and variance f~t a(a)do-. The case when a and b are functions of x will give us the time homogeneous case. Another interesting class of examples are given by a = a(t) and b = c(t)x + d(t). These are the Gauss M a r k o v processes. The aim is to explore the relationship between the processes and the coefficients that define them. There are several possible avenues to make the connection.
a. Stochastic differential equations

I f we want to generate a Gaussian r a n d o m variable Y with mean b and variance a we can do that by means of a linear transformation Y = aX + b from a standard normal X, with the choice of o- : vrd. With this analogy in mind, in It6 (1942), It6's treatment of the problem was to construct the increment of a 'Diffusion Process' y(t) by y(t + h) - y(t) b(t,y(t))h + + h) - x(t)) This is for-
where x(t) is a standard Brownian motion. Here a(t,y) = ~ . mally written as
dy = a &~ + b dt
864
S. R. S. Varadhan
Mathematically one tries to solve the above in integrated form
y(t) = y(O) + fotb(s,y(s))as + fot a(s,y(s))dx(s)

Let us, following It6, make the following assumptions. (A) The functions a and b satisfy a Lipschitz condition.
(7.1)
I~(t,x) - ~(t,y)l + Ib(t,x) - b(t,y)l < cl~ - y l

for some constant C. (B) They satisfy linear growth conditions Io-(t,x)l +
Ib(t,x)[ <_ C(1 + Ixl)
for some constant C. Then within the class of almost surely continuous (in s) progressively measurable functions ~(s, co) on the Wiener space ( f 2 , ~ , P 0 ) that satisfy
sup e P [(~(s, ~o)) 2] _< c(t) <

O<s<_t
for every t, there exists a unique solution y(.,-) to (7.1). Moreover the solution is a strong Markov process with transition probability function p(s, y, t,A) described below. The equation can be solved for t > s in the form
y(t) = x +
ft
b(v,y(v))dv +
a(v,y(v))dx(v)
and if we denote the solution by
y(s, x, t, o)), then
p(s, y, t,A) = Po[y(s,x, t, co) E A]

This gives us a family Q,,x of probability measures on the space of paths ~2,~xthat start from x at time s. It can be shown [see McKean (1969)] that under additional conditions of smoothness on the coefficients a and b the solutions y(s, x, t, o)) depend smoothly on x and in fact for any smooth function f(x)
u(s,x) = f f(y)p(s,x, t, dy) = EQs~[f (x(t) )]

is a smooth function of s and x and satisfies the equation
u 1 ~2u Ou = 0 ~s ~-~a(s,x) bT~ 2 + b(~, x)

with the boundary condition p(.,-,., .) can also expressed by
(7.2) The relation between
u(t,x)=f(x).
a,b and
Diffusion processes
865
limt_~, --t-1 s lim

t-~s t
f (y- x)p(s,x,t, dy) =
b(s,x)
-1 f ( y - x) 2p(s,x, t, dy) = a ( s ~ x ) SJ
0
lira t--+s t 1 s f ( y -- x)4p(s, x, t, dy)
In fact this was the way Kolmogorov [see Kolmogorov (1931)], formulated the problem of constructing the transition probabilities from given coefficients.
b. Partial differential equations

One can try to construct p ( s , x , t , y ) as a probability density directly and then construct a Markov process with these transition probability densities. To do this we need to find a nonnegative p that solves the equation
~p 1
~s
~2p
~p
2 a(s'x)-~fix2 + b(s'X)bx = 0
with the boundary condition
p ( s , x , t , .) --+ 6x(')
as s T t. Under the assumptions (C) For some constant C and 0 < a < 1
la(s,x) - a(t,y)] + Ib(s,x) - b(t,y)l <_ C[Is - tL~ + Ix - y l ~]

(D) For some constants C < ec and c > 0
c <_ a(s,x) < C, and Ib(s,x)l <_ C

the existence uniqueness o f p can be established. See for instance Friedman (1964) for the details. From p one constructs the Markov process in a canonical manner via finite dimensional distributions. The almost sure continuity of the paths follows from estimates on p that are obtained along the way. If we drop the lower bound a > c > 0 but strengthen the smoothness assumptions by demanding additional regularity on a and b as functions its variables of s and x, one can prove the existence of smooth solutions u for the Equation (7.2), provided the boundary data f is smooth. The value of the solution u(s,x) can be shown for each fixed t to be a nonnegative bounded linear functional o f f and a suitable version of Riesz representation theorem will give
u(s,x) = f f(y)p(s,x, t, dy)

and we get p in this manner. No matter what assumptions we use the final goal is always to get the family of measures Qs,x on the space of paths. One can show, as
866
s. R. S. Varadhan
one should, that if more than one method works in a given situation they lead to the same measures Qs#.
c. Martingale formulation
No matter how the measures are constructed it is always true that for any smooth f
f(x(t)) - f(x(s)) -
(L~,xf)(x(v))dv
(7.3)
is a martingale for times t > s relative to (~s, ~ t , Q,,x) and Qs~[co : x(s, co) = x] = 1 Here ~s is the space of continuous maps co = x(-) from Is, oo] into R. ~t~s is the a-field generated by {x(v) :s < v < t}. And
(Lv,xf) (x) = a( v, x ) f " (x) + b (v, x )f' (x)

In the martingale formulation, that was introduced in Stroock and Varadhan (1969), for given a and b and starting point (s, x) one asks if Q~,x exists such that (7.3) is a martingale for every smooth f and if such a Q~,x is unique. If we assume that the coefficients are bounded and continuous, existence is easy. Uniqueness is more difficult and under assumptions (A), (B) or (C), (D) uniqueness can be shown. Uniqueness implies that the processes Q,,x are strong Markov processes. In the martingale formulation, by establishing the corresponding Girsanov formula the consideration for any bounded measurable b can be reduced to the case of b - 0 provided a has a uniform positive lower bound. Using the martingale formulation one can establish the following random time change relation in the time homogeneous case. If V(x) is positive function with 0 < c < V(x) < C < oc we can define stopping times zt by the relation f0 ~ V(x(s)--~ 1 ds = t and use the stopping times to change the time scale of the process through the following map ~ : f2 -~ ~2
a (co)(t) = x( t(co), co)

If Q is a solution to the martingale problem for [a(x), b(x)] then Q#7 ~ solves the martingale problem for [V(x)a(x), V(x)b(x)]. One can therefore transfer existence and uniqueness results from one to the other. For the matingale formulation Stroock and Varadhan (1997) is a good source. In the one dimensional case there are some special results. For instance in the Stochastic Differential Equations formulation the Lipschitz condition can be weakened to a H t l d e r condition in x of exponent 1/2. Using a combination of random time change and Girsanov's formula one can reduce the case of any
Diffusion processes
867
bounded [a, b] with a positive lower bound for a to the case o f a - 1, b - 0, which is the case of Brownian motion and enjoys both existence and uniqueness. This gives a way of directly constructing such processes from Brownian motion. These results can be found in Dynkin (1959) or Stroock and Varadhan (1997) in full detail.
d. Limit theorems
Suppose that we have a Markov chain with transition probability rch(x, dy) and this represents a transition in a time step of duration h. We can construct a piecewise constant stochastic process Xh(t) by defining first, a Markov chain ~h n with transition probability ~Zh and then making a stochastic process out of it by defining,
Xh(t)=~h
fornh<t<
(n+ l)h
We can, for any initial condition Xh(0) = x define the distribution of this process as a measure Qx,h on the Skorohod space of paths with discontinuities only of the first kind. If one assumes that the limits
h-sO
lim ~ / (y - x)zch(x, dy) = b(x)

lim (y - x)2~zh(x,dy) = a(x)
h--40
and lim h~O -
1f
lY - xl2+6zch(x, dy) = 0
hold locally uniformly in x for some bounded continuous functions [a(x), b(x)] and that there is unique family Qx that is a solution to the martingale problem for these coefficients that start at time 0 from the point x, then it follows that
h-+O
lim Qh,x = Qx
in the sense of weak convergence. The basic idea in this approach is that with ~zhf defined as f f (y)~zh(x, dy),
n-1
f(Xh(nh)) -- f(Xh(O)) -- ~(TChf -- f)(Xh((n -- 1)h))

j--O
is a martingale with respect to (Q, ~nh, Qh,x) and under our assumptions h~olim~ (~zhf - f ) = Lf
868 exists locally uniformly, with
S. R. S. Varadhan
(Lf)(x) = a(x)f'(x) + b(x)f'(x)

This link is enough to establish that any limit point of Qa,x as h --+ 0 is a solution of the martingale problem for [a(x), b(x)]. If we have uniqueness the limit is identified. See for instance Stroock and Varadhan (1997) and Ethier and Kurtz (1986) for results of this type.
8. The multidimensional case
To start with, we note that one can define Brownian motion in d dimensions by just taking d independent Brownian motions and making them the d components of a vector valued process {xj(.)}. We get a Markov process (actually independent increments) with transition probability density given by
p(s,x, t,y)
= [2~(t
_ls)] a/2
[ ]ly-xl] 2] exp - 2~--,~-) ]
This diffusion process has the generator 1 d 82
and the transition probability density itself is the fundamental solution of the heat equation
at
2 ;=1 ~x2
For our Brownian motion we have taken the identity J as the covariance matrix. We can instead take any positive definite matrix A for the covariance and if A = o-a* with representing the adjoint operation, we can represent the new process as the linear transform y(t) --ax(t). The new transition probabilities will be
1 p(s,x,
and the new generator
[ ((y-x),A-l(y-x)}.l
2(t - s)
t,y) = [2~(t - s)y/2 [det A] 1/2 exp -
L = ~..= aij axi---~xj

We can add a constant drift vector vector b and then the corresponding transition probability density and generator will be given respectively by
i ij~l
82
Diffusion processes
869
p(s,x,t,y) =
1 [2n(t -
s)]d/2 Idet A] 1/2 x - b(t- s))'A-l(Y--2(t - s) x - b(t- s))).1
exp[-{(Y-
1~--~ ~2 b ~ L =-~ i~7..lai,j~ + Z , ,] J~xj
The general time dependent diffusion process in d dimensions corresponds to the generator 1 d
~2
L = ~. "= a i , j ( t , x ) ~
+ Zj bj(t,x) Oxj
and can be constructed by any of the methods outlined in the one dimensional case. The essential intuition is that we have a vector valued process x(t) such that the conditional distribution of x(t+h)-x(t), given the past history { x ( s ) : 0 < s < t}, is approximately normally distributed in R d with mean hb(t,x(t)) and covariance ha(t,x(t)). Here b(t,x)= {bj(t,x)} is an R d valued function on [0, eel x R d while a(t, x) = aij(t, x) is a symmetric positive semidefinite d x d matrix for each (t, x) E [0, oe) x R d. Just as in the one dimensional case if we assume that a(t,x) can be written as a(t,x)a(t, x)* and that both a(t,x) and b(t,x) satisfy a Lipschitz condition of the form
]bj(t,x) - bj(t,y)l + ~
id
laij(t,x) - ai,j(t,y)l
< C[x -
y]
as well as a bound
Ibj(t'x)[ + Z lai,j(t'x)[ <- C(1

j i,j
+ [x[)
with a constant C that does not depend on t, then we can solve the stochastic differential equation dy(t) =
a(t:y(t))dx(t) + b(t,y(t))dt
and construct a m a p from the d dimensional Brownian motion to the d dimensional diffusion y(.) corresponding to [a(-,-), b(.,-)]. I f we assume that a(t,x) is uniformly elliptic, i.e. for some constants O<c<C<ecwehave
d d i,j=l d
j=l
j+l
870
S. R. S. Varadhan
bj(., .) are uniformly bounded by some constant C, and that a(-,.) and b(.,.)
satisfy a H61der condition
Z
j
Ibj(s'x) - bj(t'Y)l + ~
i,j
lai,j(s'x) - aid(t'Y)l <- C[It - sl~ + Ix -YI~]
for some exponent ~ and constant C, then just as in the one dimensional case we can get a fundamental solution p(s, x, t,y) that can serve as the transition probability density for the diffusion. The heat operator is of course replaced by alas + L~ where 1 i,j.~1
82
/.+~ 1
bj(s,x)
The corresponding diffusion processes Q,,x can again be characterized through the martingale formulation as one for which Q,,x[x(s) = x] = 1, and
f(x(t)) - f(x(s)) -
(Lvf)(x(v))dv
is a martingale for every smooth function f . An important observation is the connection with the Dirichlet problem. We will state it for the Brownian motion and of course there are analogs for the general time homogeneous case. I f G is a connected open set in R d with a smooth boundary, for example the open ball B = {x: Ixl < 1} and if Px is the d dimensional Brownian motion starting from the point x E B we can define the stopping time ~(co) = inf[t : x(t) ~_ B] By the continuity of paths one can see that Ix(z)[ = 1 and it is not difficult to see that Px[CO : r(co) < oo] = 1 for x ~ B. Then for any continuous data f on Ixl = 1
.(x) = E
solves the Dirichlet problem 8x? = 0
in B and for b E ~B, lira u(x) = f(b)

x~b x~B
Although, K o l m o g o r o v [see K o l m o g o r o v (1931)], introduced the connection between M a r k o v processes and parabolic partial differential equations it was not till much later that advances in P D E Friedman (1964) made it possible to use the results from P D E for constructing diffusions whereas It6 had made considerable
Diffusionprocesses
871
progress using stochastic differential equations. For additional results on one dimensional diffusions Dynkin (1959) is an excellent source.
9. Diffusions with reflection
If we try to construct the one dimensional Brownian motion on the half line rather than the full line we run into trouble. The Brownian path eventually reaches 0 and wants to get out to the negative side. We must do some thing to keep it nonnegative. We may decide to give it enough of a kick to keep it nonnegative. This is easier to see in the discrete setting of a symmetric random walk or a gambler's ruin situation. Every time the gambler loses his entire fortune (i.e. reaches 0) we provide him with enough to make one more bet (i.e. move him from 0 to 1). He may lose it again in which case we make another contribution. Every time the random walk comes to zero, at the next step it is moved to 1. The continuous analog is the set of relations y(t) = x(t) + F(t), F(t) is nondecreasing and continuous in t, y(t) >_ 0 and x(t) is a given Brownian path. We are interested in the minimal F that achieves this. Alternately F is allowed to increase only when y(t) = 0. It turns out that such a pair of function y(t), F(t) exist and is unique for every continuous path x(t) and in fact are given explicitly by
F(t)
and
= -[infV0, inf I_ L o < ~
x(s)l 1
j
y(t) = x(t) + F(t)

Moreover y(t) is a Markov process whose distribution as a process is the same as that of Ix(t)]. The corresponding expectations
u(t,y)
= EV(y(t))ly(0)
= y]
now solve the equation u 1 O2U a t - 2 a y 2 on[O, oo) x[O, oc) with the Neumann boundary condition Uy(t, 0) = 0 for all t > 0. There are analogs for general diffusions in R d. It becomes more complicated because the push into the region from the boundary which can be administered in only one direction in the one dimensional case can now be administered in any nontangential direction pointing towards the interior. Wentzell in (1959) wrote down the most general boundary condition that can arise. The one dimensional case for general diffusions was considered by Feller in (1957). For a martingale formulation of the problem of constructing diffusions with boundary conditions see Stroock and Varadhan (1971).
872
S. R. S. Varadhan
References
Cameron, R. H. and W. T. Martin (1953). The transformation of Wiener integrals by non linear transformations. Trans. Amer. Math. Soc. 75, 552-575. Dynkin, E. B. (1959). One dimensional continuous strong Markov processes. Theor. Prob. Appl. 4, 3-54. Ethier, S. N, and T. G. Kurtz (1986). Markovprocesses. Characterization and convergence. Wiley Series in Probability and Mathematical Statistics: John Wiley & Sons, Inc. Feller, W. (1957). Generalized second order differential operators and their lateral conditions. Illinois J. Math. 1, 495-504. Friedman, A. (1964). A Partial Differential Equations of Parabolic Type. Prentice Hall, New Jersey. Hunt, G. A. (1956). Some theorems concerning Brownian motion. Trans. Amer. Math. Soc. 81, 294-319. It6, K. (1942). Differential equations determining a Markoffprocess (original Japanese title: Zenkoku Sizyo Sugaku Danwakai-si). J. Pan-Japan Math. Coll. 1077. It6, K. and M. Nisio (1968). On the Convergence of Sums of Independent Banach Space Valued Random Variables, Osaka. J. Math. 5, 35-48. Kolmogorov, A. N. (1931). Uber die Analytischen Methoden in der Wahrschienlichtkeitsrechnung. Math. Ann. 104, 415-458. McKean, H. P. (1969). Stochastic Integrals. Academic Press. Revuz, D. and M. Yor (1999). Continuous Martingale andBrownian Motion. 3rd edn., Springer-Verlag. Stroock, D. W. (1993). Probability Theory. An Analytic View. Cambridge University Press. Stroock, D. W. and S. R. S. Varadhan (1997). Multidimensional Diffusion Processes. 2nd edn., Springer-Verlag. Stroock, D. W. and S. R. S. Varadhan (1969). Diffusion Processes with continuous coefficients, I, II. Comm. Pure Appl. Math. XXII, 345 400, 479 530. Stroock, D. W. and S. R. S. Varadhan (1971). Diffusion Processes with boundary conditions. Comm. Pure Appl. Math. XXIV, 147225. Wentzel, A. D. (1959). On the boundary conditions for multidimensional diffusion processes. Theor. Prob. Appl. 4, 164-177. Wiener, N. (1923). N. Differential-Space. J. Math. Phys. 2, 131-174. Wiener, N. (1924). The average value of a functional. Proc. Lon. Math. Soc. 22, 454-467.
~'~
.g... ,,.2
It6's Stochastic Calculus and Its Applications
Shinzo Watanabe
O. Introduction
It6's stochastic calculus is one of the most important methods in modern probability theory to analyze various probability models. In up-to-date applications, for example, it provides us with a basic tool in the theory of mathematical finance such as theory of derivative pricing. A general framework of this calculus is usually given as a sample functions analysis for a class of stochastic processes called semimartingales or generalized Itd processes. A characteristic feature of a semimartingale is that it represents a random motion in time perturbed by a noise; the noise is assumed to be a martingale which represents a fluctuation around the main motion and the main motion is assumed to be adapted and smooth in the sense that its sample function is of bounded variation in time. Here, the martingale property and adaptedness are referred to a given filtration on the probability space. In this framework, we can develop a differential integral calculus for sample functions. A basic notion here is stochastic integrals' and, by using this notion, a fundamental formula in this calculus, now very well-known as Itd'sformula, can be obtained. It is a formula for the change of variables or the chain-rule under the differentiation for sample functions of semimartingales and, when applied successively, leads to the Taylor expansion formula. We can associate for semimartingales some of their characteristic quantities such as quadratic covariational processes and compensating random measures for jumps which may be considered as a natural extension of covariance parameters of Gaussian components and L~vy measures, respectively, in the L~vy-Kchinchin characteristic of infinitely divisible distributions or associated L6vy processes. What is crucially important in this framework is the fact that fundamental noises in probability models in continuous time such as Gaussian white noise (equivalently Wiener process) or Poisson point process can be characterized in terms of these characteristics of semimartingales. From this fact, we can extract fundamental noises in stochastic models; in other words, we can see how these models are combined with noises. So, roughly, we could obtain stochastic equations governing stochastic models. Typically, this is an It6's stochastic differential 873
874
S. Watanabe
equation for the Kolmogorov diffusion model. Using these stochastic equations, we can construct and analyze the structure of stochastic models. Thus, besides analytical methods based on differential equations. Fourier analysis and functional and potential analytic methods like Hille-Yosida's theory of semigroups and Fukushima's theory of Dirichlet forms, among others, stochastic calculus also provides us with efficient and sometimes more direct probabilistic methods in the study of probability models; such a probabilistic approach could in many cases give a deeper understanding of results obtained by an analytical approach.
1. Semimartingales and stochastic integrals

1.1. General notions on stochastic processes
Let (~2,S , P ) be a probability space, i.e., a a-additive probability measure P defined on a a-field @ of subsets in g2. We assume always that it is complete, i.e., if A c B, B E Y and P(B) = 0, then A E S so that P(A) = 0. Let X =
{A
g IP(A) = 0).
In this article, we consider random phenomena depending on the continuous time; t E [ 0 , ~ ) . Of course, there are corresponding notions or results in the discrete time case and they can be stated much simpler because there are much less mathematical technicalities involved. By afiltration, we mean an increasing family {St}tE[0,~) of sub a-fields of S : StCS fortE[0,~) andStcSs if0<t<s .
A filtration is said to satisfy the usual conditions if (i) Y C S 0 so that X c S t for every t c [0, ec). (ii) t H S t is right-continuous in the sense that S t = ~s>tS~ for every t E [0, oc). In the following, we assume that a filtration satisfies the usual conditions unless otherwise stated. By a random time, we mean a random variable T with values in [0, oc]. Given a filtration {St}, a random time T is called an {St}-stopping time (or simply a stopping time if the filtration is well understood), if [T < t](:= {colT(co) _< t}) c S t for every t c [0, oo) .
Because of the right-continuity of {~-~t}, T is an St-stopping time if and only if IT < t] E S t for every t E [0, oc). For an St-stopping time T, define a sub-a-field S r of S by Sr={AES[An[T_<t]ESt for a l l t ~ [ 0 , oc)} .
We introduce several notions concerning stochastic processes, i.e., a family X = (X(t, co)) of random variables depending on the time t E [0, oo). We consider a stochastic process X = (X(t, co)) taking values in a Polish space S; a Polish space is a topological space with the topology given by a metric under which it is
Itd's stochastic calculus and its applications
875
separable and complete. Let 5 ~ be the a-field of Borel subsets of 5:, i.e., 5 ~ is the smallest a-field on 5: containing all open subsets of S. X defines a m a p
(t, co) x(t, co) s . X is called {~-dt}-adapted if for each t E [0, oc), the map f2 ~ co~-~X(t, co) E 5: is o~t/SP-measurable. X is called {~t}-progressively measurable if, for every t E [0, ec), the map X : [0, t] x f2 ~ (u, co) ~ X ( u , co) E S is N[0, t] ~ t - m e a s u r x: [0,
able. If X is {~t}-progressively measurable, then it is { ~ t } - a d a p t e d . Two stochastic processes X and Y with the same state space 5: are considered as the same and we write X = Y if
{col3t, x(t, co) y(t, co)}
J<
X is called right-continuous (left-continuous), or cfid (resp. cfig), if
\ {colt ~ X(t, co) is right continuous (resp. left-continuous)} ~ Y .

X is called right-continuous with left-hand limits, or cfidlfig, if
Q \ {co[t~--,X(t, co)is right continuous with left-hand limits} E Y
X is called continuous if it is both right- and left-continuous. I f X is { f f t } - a d a p t e d and cfid or {~-t}-adapted and cfig, then it is {fft}-progressively measurable. I f X and Y with the same state space are cfid, or cfig, then X = Y if and only if, for every t E [0, oo), {co[X(t, co) Y(t, co)} c Y . Let ~ ((9) be the smallest a-field on [0, t ] x (2 with respect to which all {~-t}adapted, R-valued and cflg (resp. all {~-t}-adapted, R-valued and cfid) processes X : [0, oc) f2 --+ R are measurable. Then it holds that ~ c (9. This can be seen from the fact that N coincides with the smallest a-field on [0, t] ~2 with respect to which all {~,~t}-adapted continuous processes are measurable. is called the predictable a-field and (9 the optional a-field. 5: being a Polish space, an 5:-valued stochastic process X is called {~t}-predictable ({~t}-optional) if the map X : [0, oc) x f2 ~ (t, co) ~ X ( t , co) E S is N / ~ - m e a s u r a b l e (resp. (9/5 pmeasurable). Since ~ c (9, an {~-t}-predictable process is an { ~ t } - o p t i o n a l process. An {~'t}-optional process is an {~-t}-progressibly measurable process.
1.2. Martingales and processes of bounded variation

Suppose that we are given a filtration {Yt}tc[0,o~) on a complete probability space which satisfies the usual conditions. An R-valued stochastic process X =- (X(t, co)) is called an {~t}-martingale if it satisfies: (i) it is adapted to {~-t}, i.e., X(t) is ~ t - m e a s u r a b l e for every t E [0, ec), (ii) X(t) is integrable for every t c [0, ec), i.e., X ( t ) E La(f~,P) for every
[o,
(iii) for every t > s > O,
E(X(t)h~-, ) = X(s) a.s. i,e., E(X(t)IA) = E(X(s)I~)
for all A E ~ s -
876
S. Watanabe
For a martingale X, we always assume that it is cfidlfig; this is no loss of generality because, for any {~-t}-martingale X, there is a cfidlfig {~t}-martingale X ~ such that X(t) = X'(t), a.s. for every t E [0, ec). I f X = (X(t, co)) is an {Yt}-martingale and T is an {Yt}-stopping time, then the stopped process X r = (Xr(t, co)) defined by
x (t, co) -- x(r(co) A t, co)

is also an {~t}-martingale. This fact is known as Doob's optional stopping theorem. Noting it, we give the following definition: a cfidlfig {~t}-adapted process X = (X(t, co)) is called an {~t}-local martingale if there exists a sequence Oo of {~t}-stopping times such that { T,n}n=l T, < T~+I, limnTooTn = ec a.s. a n d X r', is {~t}-martingale for each n. Also, the following characterization of martingales in terms of stopping times is useful: A cfidlfig {~t}-adapted process X is an {~t}martingale if and only if for every bounded {Yt}-stopping time T, X(T) is integrable and E(X(T)) = E(X(O)). Here, T is called bounded if T(co) < K a.a. co for some constant K > 0. Introduce the space M2 of square-integrable martingales by
M2 = {M = (M(t, co))]M is an{~t}-martingale with

M(0) = 0 a.s., and E(X(t) 2) < oc for every t E [0, oc)} and the space Ma,ioc of locally square-integrable martingales by M2,~oc = {M = (M(t, co))]3{Tn} : a sequence of {@t}-stopping times such that Tn _< T~+I,
/,/Too
(1)
limT~ = oc a.s. and M r" E M2 for n = 1 2,...
(2)
Let
A = {A = (A(t, co))lA is {~t)-adapted, cfidl~tg, A(0, co) -- 0 and t~-+A(t, co) is increasing for a.a. co}
and
(3)
A1 = {A = (A(t, CO))[A E A and E(At) < oc for every t C [0, ec)} .
(4)
Denote by A pred and Apred their subclasses formed of all predictable elements, respectively. Set, further,
V = {V = (V(t, co))l~A1 = (Al(t, co)) e A,A2 = (Az(t, co)) E A such that V(t, co) = Al(t, co) -A2(t, co)}
(5)
and define V1, wPred' --lXtypred' similarly by replacing A with A1, A pred and Apred --1 , respectively, in (5). A E A is called an {t}-increasing process and V V an { ~ t }-process of bounded variation.
877
Let M~ = {M c M2[M is continuous}, M = {M E M2,1oclM is continuous} and defined similarly. Obviously, V c V pred and V c 1 C vpred -1 Note also that M is exactly the class of all continuous {Yt}-local martingales M with M(0) = 0 and V is the class of all {~-t}-adapted continuous processes V such that V(0) = 0 and [0, t] H V(t) is of bounded variation for every t > 0, a.s.. An important consequence of the Doob-Meyer decomposition theorem for submartingales (cf. Dellacherie-Meyer, 1980) is that, for M, N c M2,1oo,(M2), there exists unique V E V pred, (resp. vPred), such that M(t)N(t) - V(t) is an {Wt}-local martingale (resp. {o~t}-martingale). This V is denoted by (M, N) and is called the quadratic co-variationalprocess of M and N. (M, M) E A pred (resp. A pred 1 ) and this is simply denoted by (M). (M, N) is continuous if, at least, one of M and N is continuous. V c , Vc 1 are 1.3. Wiener martingales and Wiener processes Let M = (M(t)) E (M) d, M(t) = ( M l ( t ) , . . . ,Md(t)), be a d-dimensional continuous {~-t}-local martingale with M(0) = 0, i.e., M i E M for i = 1 , . . . , d. It is called a d-dimensional {~t}-Wiener martingale if (Mi,MJ)(t) = (~i,j . t for i , j = 1 , . . . , d . (6)
A d-dimensional Wiener process (Brownian motion) X = (X(t)) is, as usual, a continuous Rd-valued process with stationary independent increments with N(O, (t - s)I) as the law of X ( t ) - X(s) for t > s > O. [N(m, V) stands for the multivariate normal distribution with the mean vector m and the covariance matrix V.] I f X = (X(t)) is a d-dimensional Wiener process, then it is easy to see that the process M = (M(t)), defined by M(t) = X ( t ) - X(O), is { J t } - W i e n e r martingale if the filtration {Yt} is the natural filtration of X, i.e., J~t is the smallest a-field containing all P-null sets and with respect to which all X(s), 0 < s < t, are measurable. (This filtration is always right-continuous.) Conversely, we have the following result known as P. Ldvy's martingale characterization theorem for Wiener process. This result is basically important in stochastic analysis: TI~EORE~ 1. Suppose that X = (X(t)) is an Rd-valued, continuous and {Wt}adapted process (defined on a probability space with a filtration { ~ t } ) such that M = (M(t)) defined by M(t) = X ( t ) - X(O) is {~-t}-Wiener martingale. Then X is a d-dimensional Wiener process. Furthermore, for every s >_ 0, the family X ( t ) - X(s), t > s, is independent of the a-field Ys. (Such a Wiener process is usually called {~-o~t}-Wiener process.) 1.4. Stochastic integrals If V ~ V, then, for each t E [0, oo), s C [0, t] ~ V(s) is of bounded variation, a.s. and we denote by lIVIl(t) its total variation on [0, t]. Then IIVII E A. For
878
S. Watanabe
{@t}-predictable processes ~ = ( q ) ( t , co)) and t P = ( T ( t , co)), the following Schwarz-type inequality holds for every t E [0, co), a.s.:
fo~+ l~)(u)~(u)I "dlI(M,N)H(u)<_ I/or+ ~(u)2d(M)(u) ]I/2

[Jot+~(u)2d(N)(u)l 1/2
For a given M E ~/I2,1oc, let (7)
( ~2,1oc(M) = J ~ = (~(t, co))l~b is {~t}-predictable and there exist a

sequence {Tn} of {~t}-stopping times such that Tn -+ oc a.s. and, for every t E [0, ec) and n = 1,2,..., and
q~(s)2d(M)(s
< oc
(8)
( =~2(m) = ~ ~ = (~(t, ~o))1~ is {~-t}-predictable and

E
'+~(s)2d(M)(s )l
<ooforallt>0
(9)
THEOREM 2. For given M E Mz,mc and 4)c L-q'2,1oc(M), there exists a unique N C M2,ioc such that, for every L E M2,1oc, the following identity holds a.s.;
<L,N)(t) =
f0'+ ~(u).d(L,M)(u)
(I0)
The right-hand side of (10) is an co-wise Lebesgue-Stieltjes integral which is welldefined because of the inequality (7). We denote this N by
N(t, co) =
or simply by
J0'
~(u, co)- dM(., co) ,
/0'
e(.).an(.)
or
e.d~,
and call it the stochastic integral of (b(u) by M.
Itd's stochasticcalculusand its applications

The following are the fundamental properties of stochastic integrals:
879
1) If M E M c and E Y2,1oc(M), then f~b. dM E M c, i.e., the stochastic integral by a continuous local martingale is again a continuous local martingale. 2) I f M E M2,1oc, q~ C Y2,1oc(M) and 7~ E 5~2,1oc(f~b- dM), then ~ 7 j E ~2,1oc (M) and / ~ d(/~b - d M ) = ./4~k~ dM . for every e,/? C R, ~b +/~T =
3) If ~, 7j E 5e2joc(M), then (~b(t) + / ~ ( t ) ) C S2,1oc(M) and
/ (o~b+ fl~P) . dM = c~f q~. dM+ fl / Tt . dM .

4) If M, NEM2,1oc, and ~bESC'2(M) nS~2(N), E L-~2,1oc(c~M+/~N) and then for every c~,/~ER,
f ~.d(~M+~N)=~/~.dM+~/~.dN.
5) IfM, N E M2,1oc, and C 52,1oc(M), kv E ~2,1oc(N), then
(f ~'dM,./ ~" dX)(') = f'+~(u)~(u) "dIM,N)(u).

The right-hand side is well-defined as a Lebesgue-Stieltjes integral because of the inequality (7). In particular,
I / q~.dM)(t)= fot+~(u)2.d(M)(u) .
6) If 45 is given in the form
~(t) - - l [ T < t ] - (p
where T is {ft}-stopping time and ~o is a bounded fir-measurable real random variable, then 4~ E 52,1oc(M) for every M C M2,1oc and we have
fo ~ ~(~) . da(~) = ~o. (M(t) - M ( t A r ) ) ,
t _>
In particular, we have f0 t l[,_<r]-dM(u) = M ( T A t),
t>0 .
Combining this with 2, we can conclude that, if c Ya,loc(M), M C M2,1oo, then, for every {ft}-stopping time T, (l[t_<r]~b(t)) ~ ~2,~oc(M) and
,~0t
~(u) l [ ~ r l ' d M ( u ) =
f TAt
./0
~(u).dM(u),
t_> 0 .
880
S. Watanabe
Furthermore, if ~0 is given in the form

N
(t) = ~o0. l[t=0] + Z

k-0
q)k l[Tk<t<T~+~]
(11)
1N+I of {~t}-stopping times such that To = 0 and Tk < T~+I, for a sequence fT~ "t k;k=0 a.s. and bounded Yvk-measurable real random variables %, k = 0 , . . . ,N, then ~b E ogaZ,loc(M) for any M E M2,1oc and
fo
~b(u).dM(u)=~-~Cpk.(M(Tk+lAt)-M(TkAt)) ,
k=0
t>_O .
(12)
A predictable process ~b of the form (11) is often called a bounded simple predictable process. The assumption that (p and q~k are bounded is not needed when M E M C. Then the formula (12) holds for every simple predictable process ~, i.e., ~o k in (11) need not be bounded. 7) For M E M2,1oc and ~b E oocza2,ioc(M), f ~b- dM E M2 if and only if ~b E ~Cza2(M). Then we have
I(/0
=E
qS(s).dM(s
=E
~b(s)2-d(M)(s
,1
,
t> 0 .
More generally,
[/o
q~(s)gJ(s).d(M,N)(s),
t>0
if E ~2(M) and ~g E ~2(N).
1.5. Point processes and Poisson point processes

Let (if, ~_=) be a measurable space. By a point function on E, we mean a mapping
p : I)p c
(o, ~)
~ t ~ p(t) c s
where the domain Dp o f p is a countable subset of (0, oo). p defines a counting measure Np(ds, d~) on (0, ~ ) x E with the product a-field ~(0, oo) ~ z by
Np((O,t]x U ) =
~
sEDp,s<t
1u(p(s)),
t>0,
UE~z
Let Hz be the totality of point functions on E and ~(/Tz) be the smallest o--field o n / 7 ~ with respect to which the all the mappings Flz ~pHNp((O,t] x U) C N, for t > 0 and U c ~ e , are measurable. Let (Y2,W,P) and {fit} be as above. A point process is obtained by randomizing the notion of point functions; a point process p on E is a (//z, ~ ( H s ) ) -
881
valued random variable, i.e., p : (2 --+ H3 which is ~-/2(Hz)-measurable. p is called {~t}-adapted if, for every t > 0 and U E 2 s , Np((O, t] x U) is ~t-measurable, p is called a-finite if there exist U, c 2 e , n = 1 , 2 , . . . , such that
Un C gn+l,
U U, ----~
n
and
Np((O, t] Un) < oo

a.s. for e v e r y t > 0 a n d n .
Let p be an {Wt}-adapted, a-finite point process. We say that p possesses the
compensator Np if a nonnegative random measure Np(ds, d~) on [0, oc) x e exists,

i.e., for E c 2([0, oc)) 2 z , Np(E) is [0, eel-valued random variable and, with probability one, E ~ N p ( E ) is a measure on ([0, oc) 3, 2([0, oc)) 2 z ) , such that the following hold: (i) t~Np([O,t] U) is {Yt}-predictable for every U E if, (ii) if U, are those subsets in the definition of a-finiteness of p, then Np([0, t]x U ~ ) < o c a . s . (iii) for every t>0andn ,
t~--~ Np([O,t] (Un 718)) -Np([O,t](Un Me)) C M2,1oc for every n and B E 2 s
Np, when exists, is uniquely determined from p by these properties and satisfies always 2Vp({s}xB)_< 1 a.s. for e v e r y s _ > 0 a n d B c 2 3
The existence of the compensator is assured fairly generally: For example, if S is a Polish space and 2 s is the Borel a-field, the compensator always exists. Note that the existence of Np([0, t] (U~ A B)) in (iii) above for a fixed B and n is guaranteed by the Doob-Meyer decomposition theorem. Suppose that p possesses the compensator Np and set /Vp([O, t] B) = Np([O, t] x B) - Np([O, t] x B) . Then
t~--~Np([O,t l x B ) cM2.,oc,
ifBc
U,, f o r s o m e n
and it holds, for B, B' E 2 s such that B, B' C U, for some n and t > 0, /Np([0, t] B),Np([0, t] B'))--NA[0, t] (B riB'))
B)
s<t
B,).
A function IO, oc) S f2 ~ (t, 3, co) ~-+g(t, 3, co) E R is called {~t}-predictable, if, as a function ([0, oo) Q) 3 E ((t, co), 3) ~-~g(t, 3, co) E R, it is ~ 2 s measurable. Introduce the following classes of predictable functions:
882
S. Watanabe
Fp = ~f(t, 4, co)If is {~t}-predictable and
]f(s,~,co)lNp(ds, d~) <
ec
a.s., for all t > 0 and < oo for all t > 0} and < oc for all t > 0
Vlp= {f(t, ~,co)]f is {fft}-predictable

E rf0t+j~s z
If(s, ~, co)]. Np(ds, d~)]
{f(t, 4, co)If is {St}-predictable E If(s,~,co)12.~ip(ds, d~) {f(t, 4, co)If is

l[0,r,,](t)' f(t,
{@t}-predictable and there exists a sequence
{T,} of {~t}-stopping times such that T, ~ oc a.s. and
~, co) E I~2
for n = 1,2,... } .
We are now going to define the stochastic integral fo+ fsf(s, 4, co)Np(ds, d~) f o r f E F~ '1c so that [t~ fo+ fzf(s, ~, co)Np(ds, d~)] is an element in M2,,oo. Note that integrals fo+ fzf(s, 4, co)Np(ds,d4) for f E Fp and fo+ fzf(s, 4, co)Np(ds, d~) for f c F~ can be defined by co-wise Lebesgue-Stieltjes integrals. However, the first integral can not be defined as a difference of the second and the third integrals; in other words, it can not be defined an absolutely convergent integrals. Details are given as follows: First we remark that, f o r f E Fp, fo+ fzf(s, 4, co)Np(ds,d4) is well-defined as a Lebesgue-Stieltjes integral and coincides with the absolutely convergent sum
E
s<t,sEDp
f(s,p(s),co) .
Next, let f E Fp 1. Then we have E [f0t+ ~
If(s, ~, co)]- Np(ds, d~)l
= E[f0t+ fz
[f(s, ~, co)]' Np(ds, d4)l
and hence, F1 C Fp. Define
ft+ fzf(s,~,co)~p(ds, d~) = ft+ fzf(s, 4, co)Np(ds,d4) -.fot+ f=f(s,~,co)Np(ds, d~)
883
for f E Fp 1. If f E Fp a n F 2, then we can see that
[t~-~ fot+ f f(s,~,co)Np(ds, d~)] EM2

and
i fot+ fzf(s,~,co)~p(ds, d~)) : fo t+ ~ f f(s,~,co) 2Np(ds, d~)
I f f E Fp 2, we can show that there exists a unique M = property that
(M(t))
E ~/I2 having the
E M(t)-
fn(s,~,co)Np(ds, d~)
---+0
asn--+c~
for any sequence fn E F 1 n F 2 such that
E
We define
show that there exists a unique M = (M(t)) E Mz,~oc having the following property: F o r every { ~ t } - s t o p p i n g time T such that f(s, ~, co)l[0,r ] (s) E Fp 2, it holds that
(So+Li(s, 4, co)Np(ds, d~) ) ,>0 by this M. Finally, i f f
[J02 ]f~(s,~,co)-f(s,~,co)12"Np(ds, d~)---+0 ] f0'Zf(s, 4, co)
asn--+oc ~ _p
. , we can
M T(t) = M(t A T) =
l[0,r] (s)Np(as, d~) .
This M is, by definition, the stochastic integral
(f~t+fzf(,s
~, co)Np(ds, d~))t>_ 0"_
If M(t) -- fo+ fzf(s, 4, co)Np(ds, d~) and ~b = (~(t)) is {~-t}-predictable, then ~b E Y2,1oc(M) if and only if (f(t, 4, co)q~(t)) E F 2'lc. Under this condition, it holds that
The most important class of point processes is that of Poisson point processes. Generally, a point process p on .~ is called a Poisson point process if the following are satisfied: (i) if E1,...,E,c.~([O, oc))Nz are disjoint, then Np(E1),...,Np(En) are mutually independent, (ii) f o r E C N([0, oc)) Nz, Np(E) is Poisson distributed, the caseNp(E) = ec a.s. being allowed as a Poisson r a n d o m variable with infinite expectation.
884
S. Watanabe
An {~t}-adapted Poisson point process is called an {Yt}-Poisson point process if, for every s > 0, the family Np((S,t] x B),t >_s,B E ~ , is independent of the a-field ~ . only if Vp defined by An {~t}-Poisson point process p is a-finite if and
vp(E) = E[Np(E)]
is a a-finite measure on ([0, ec) x S, ~([0, oc)) -~z) in the sense that there exist Un E Ne, n = 1,2,..., such that Un c U~+I, U~ U~ = ~ and vp([0, t] x Un) < oc for every t > 0 and n. Vp satisfies always
Vp({S} x S) = 0 .
(13)
This corresponds to the fact that Np({s} 3) _< 1, a.s., which is obvious because Np is a counting measure of a point process. A a-finite {~-t}-Poisson point process p possesses the compensator Np which coincides with the deterministic measure Vp. Just as the L~vy theorem in the case of the Wiener process, this property characterizes a Poisson process. Namely, we have the following fundamental fact (cf. Ikeda-Watanabe, 1988): THEOk~M 3. If an {~-t}-adapted a-finite point process p possesses the compensator which is a deterministic measure on ([0, ec) ~, N([0, oc)) ~ s ) , t h e n p is an {~t}-Poisson point process. Conversely, given an infinite and a-finite deterministic measure v on ([0, ec) ~, N([0, oo)) ~ ) having the property (13), there exists an {~t}-Poisson point process (on a suitable probability space with a filtration) having v as its compensator. A family of {Y/}-Poisson point processes are always independent of any family of {~-t}-Wiener processes and, furthermore, if p l , . . . ,p, are a family of { ~ t } Poisson point processes, then they are mutually independent if and only if their domains Dp, are mutually disjoint, a.s.
1.6. Semimartingales
Let (~2,~,P) and { ~ t } be as above. An R-valued, {@~}-adapted and cfidlfig process X = (X(t)) is called an {Yt}-semimartingale if it is represented in the form
X(t) = X(O) + M(t) + V(t)
(14)
where X(0) is ~0-measurable, M = (M(t)) E Ma,loc and V = (V(t)) E V. The class of semimartingales is important in the study of random phenomena in continuous time: Very roughly, V(t) is the noiseless motion (mean motion) which is, however, disturbed by a noise and the fluctuation from the mean motion caused by the noise is given by M(t). It is sometimes convenient to define the stochastic integral by a semimartingale. I f X = (X(t)) is an {~t}-semimartingale given in the form of (14), then we define ~b = (4~(t)) E (X) if and only if is {~t}-predictable, E ~2,1oc(M) and
885
J'0+ I~(s)ldll vii(s ) < oe a.s. for every t > 0. The stochastic integral fo + Cb(s)dX(s) of E S ( X ) by X is, by definition, an {Yt}-semimartingale defined by
~b(s)dM(s) 4,Io
~b(s)dV(s)
where the second integral in the right-hand side is the co-wise Lebesgue-Stieltjes integral. Although the expression of X in the form (14) may not be unique, in general, the definition is irrelevant of a particular expression and the stochastic integral is uniquely defined from ~b and X. We introduce a seemingly restricted class of semimartingales; this class is, however, equivalent to the class of all {Yt}-semimartingales, as we shall see. Let X = (X(t)) be an R-valued, {Yt}-adapted and c/Ldlfig process given in the form
X(t) = X(O) 4, M(t) 4. V(t) 4.
f(s, ~, co)Np(ds, d~)
+ f'+
where (i) (ii) (iii) (iv)
g(s, #,
(15)
X(0) is an fr0-measurable real random variable, M C M C so, in particular, M(0) = 0, V E V C so, in particular, V(0) = 0, p is an {ft}-adapted, a-finite point process on some state space (3, Ne) possessing the compensator Np, f E Fp and g E F2'1c such that f . 9 - 0.
It is not difficult to see that X given in the form (15) is an {ft}-semimartingale. Conversely, any {ft}-semimartingale X can be represented in the form for a suitably chosen { f t } - a d a p t e d point process p: Indeed, p is naturally defined by the discontinuities of X as 3 = R \ { 0 } , D p = {tE (O, oc)lX(t ) C X ( t - ) } , p(t) = X(t) - X ( t - ) , t E Dp and
f(t,~,co) = ~,. 1[[~1>1](~),
g(t,~,co) = ~-111~_1_<1](~ ) .
Cf. Jacod-Shiryaev, 1987 for details. Thus, it is no loss of generality to introduce this class of R-valued processes as semimartingales. In (15), M is uniquely determined from X: We call it the continuous martingale part of X and denote it by Mx. If, in particular, X is continuous, then the decomposition X(t) = X(O) 4, M(t) 4. V(t) is unique. We call V the drift part of the continuous semimartingale X and denote it by Vx. The decomposition
x(t) = x(o) + Mx(t) + vx(t)
(16)
is called the canonical decomposition or semimartingale decomposition of the continuous semimartingale X. Consider a particular case in which Mx and Vx are given in the form
886
S. Watanabe Mx(t)=
f0 eb(s)dB(s),
Vx(t)=
f0t ~U(s)ds
where B is a one-dimensional {@t}-Wiener martingale, E ~2,ioc(B), i.e., f0 I~(s)l 2ds < oc for every t > 0, a.s. and 7~ is an {Yt}-predictable process such that J0 I~(s)[ as < oc for every t >_ 0, a.s., so that X is given in the form
X(t) = X(O) +
f0 t ~b(s)dB(s) +
T(s)ds .
Such a process X is called an Itd process. Thus, semimartingales are a generalization of classical It6 processes and, therefore, they are sometimes called generalized It6 processes.
2. Stochastic calculus for semimartingales
2.1. Itd's formula

Let (Q, ~ - , P ) and {fit} be as above. Suppose, on this space, the following are given: (i) xi(o), i = 1 , . . . , d; Y0-measurable real r a n d o m variables,
(ii) M i c M c, i = 1 , . . . , d , (iii) V i E V c, i = 1,..., d,
(iv) an { ~ t } - a d a p t e d , a-finite point process p on some state space (Z,-~z) possessing the compensator Np a n d f i = (fi(t, 4, co)) E Fp, gi = (gi(t, 4, (9)) E F 2'lc such that fi(t, 4, co)" gi(t, 4, co) - 0, i 1 , . . . , d; furthermore, we assume that gi(t, 4, co) are bounded, i.e., there exists a positive constant M such that Igi(t, 4, co)] < M for all i, t, 4, co. Define a d-dimensional {Yt}-semimartingale X = (X(t)), X ( t ) = (Xl(t),...,
xd(t)), by
xi(t) =
xi(o) -[- Mi(t) + vi(t) +

+ gi( s, 4, co)Np(ds, d~)
fi(s, ~, co)Np(ds, d~)

(17)
for i = 1 , . . . , d . Denote also f = ( f l , . . . , f d ) An It6's formula is stated as follows:
and g = (gl,...,gd).
THEOREM 4. Let F(x) = F ( x l , . . . ,Xd) be a real-valued cg2-function on R e. Then F(X) = (F(X(t)) = F ( X l(t),... ,Xd(t))) is also an {~t}-semimartingale and the following holds:
887
F(X(t))- F(X(O))= ~ ~o (SiF)(X(s-))dMi(s)

d
i=1
+ E f (aiF)(X(s-))dVi(s)
+ 2 ~,~1, =
,10 dE
(a?Y) (X(s -)) d (Mi (s)' Mqs)) (s)
+ [ t + [ {F(X(s-) + f(s, 4, ")) - F(X(s-))}Np(ds, d4) + + {F(X(s-) + g(s, 4, ")) - F(X(s-))}Np(ds, d4)
7o7 {
e(X(s-) + g(s, 4, ")) - g(X(s-))

-- E
i=1
gi(t' ~' ")" (aif)(X(S--))
(ds, d4)
Here, ~iF = ~F/Oxi. If, in particular, X(t)= (Xl(t),...,xd(t)) semimartingale given by is a continuous d-dimensional
xi(t) = xi(O) + Mi(t) + vi(t),

then
i= 1,...,d ,
F(X(t)) - F(X(O)) = E
/ot(
aiF)(X(s))dMi(s) +
/ot(
i=1
5iF)(X(s))dVi(s)
i=1
l d ~ot + 2i.~j ~i.= (~j~iF)(X(s))d(Mi(s)'MJ(s))(s) .

In other words, the martingale part MFoX and the drift part VFoX of the continuous semimartingale F o X = (F(X(t)) in its canonical decomposition are given by
d t
MFoX(t) = E fO (5iF) (X(s))dM i(s)

i=1
and
d
i=1
aox(t) = ~ f (aiF)(X(~)ldV'(s)
+ 2 ~,~j
l fot
(ajaiF)(X(s))dIM'(s)'MJ(s))(s) "
888
S. Watanabe
2.2. Itd's formula in stochastic differentials and the Stratonovich stochastic differentials Let (~2, Y , P ) and {Yt} be as above. Let S be the totality of continuous {~t}-semimartingales X = (X(t)) with the canonical decomposition
x ( t ) = x ( o ) + Mx(t) + vx(t) .
We write formally X(t) -X(s) = dX(u)
(18)
(19)
and call dX (denoted also by dX(t) or dXt) the stochastic differential o f X E S. To be precise, dX(t) m a y be considered as the equivalence class containing X E S under the equivalence relation ~ defined by; X ~ Y , X , Y E S, if and only if X ( t ) - X(O) = Y(t) - Y(0), t >_ 0, a.s. Let dS be the space of all stochastic differentials dX for X E S. N o w (19) may be considered as the definition of the integral in the right-hand side for dX E dS. Denote by dM and dV the subclasses of dS formed of all stochastic differentials of elements in M c and V C, respectively, so that dS = dM dV (a direct sum). For dX, d Y E dS and cq/~ E R, define ~dX +/?dY E dS by d(0d2 +/~Y), and define d X . dY E dS by d{Mx, My), so that dS is a commutative algebra over R under these operations. Note that d X - d Y E dV and d X . d Y = 0 if one of dX and dY is in d r . In particular, dX- dY. dZ = 0 for every dX, dY, dZ E dS. Let B be the totality of R-valued {Jt}-predictable processes such that SUpsci0,t] I@(s)[ < oc, a.s, for all t > 0. Then, B C 5~2,1oc(M) for every M E M C. Define @ dX E dS for ~ c B and dX E 1S by @.dX=d(/q~.dX)=d(f@-dMx)+d(.f@.dVx)
@ dX is uniquely determined from @ c B and dX c dS. Note that @- dX E dM if dX E dM. Ito's formula can be stated, in this context, as follows: Let X ( t ) = ( X l ( t ) , . . . , X a ( t ) ) , X i E S, i = 1 , . . . , d , and F be a cg2-function: R d ~ R. Then F ( X ) = (F(X(t)) E S and a dF(X) = Z ( a i F ) ( X )
i=1
1 a . dX i + ~ Z(ajaiF)(X)
i.j=l
. (dX i. d X j) .
(20)
We are going to introduce an important operation on the space OS; that of Stratonovich differentials. Noting S c B, define X o d Y E dS for X, Y c S by XodY=X.dY + 2 dX.dY . (21)
X o dY is uniquely determined from X and dY and is called the Stratonovich stochastic differential or a stochastic differential o f the Stratonovich type. These stochastic differentials obey the following rule: for X, Y, Z E S,
889
X o (dY + dZ) = X o d Y + X o d Z , ( X + Y) o d Z = X o d Z + Y o d Z , X o (dY. dZ) = (iVo dY). dZ = X . (dY. dZ), Xo(YodZ)=(XY)o(dZ) .
The process f x o d~Y = ( fo X o dY = f~ X(s) o dY(s))t>o is called the stochastic integral of X by Y in the sense of Stratonovich or of the Stratonovich type whereas the process f X- d Y = ( f0 X . d Y = fo X (s). d Y(s))t>0 that in the sense of Itd or of the Itd type. Using this operation, It6's formula (20) can be rewritten as follows: Let X(t) = (Xl(t),... ,Xd(t)), X ~ E S, i = 1,... ,d, and F be a @-function: R a -+ R. Then F(X) = (Y(X(t)), (SiF)(X) = ((SiF)(X)(t)) C S and
d
de(X) = Z ( ~ i [ / ' ) ( X )
i=1
o dX i .
(22)
This chain rule for stochastic differentials takes the same form as in the ordinary differential calculus. This is the reason why the use of Stratonovich stochastic differentials is quite convenient in transferring notions used in ordinary calculus into stochastic calculus and in defining intrinsic (i.e., coordinate-free) notions probabilistically.
2.3. Stochastic calculus for complex martingales and conformal martingales

Let (~2,Y , P) and {~t} be as above. Let Z = (Z(t)), Z(t) = (Z 1(t),..., Zn(t)) be a Cn-valued, continuous and {@t}-adapted process. Write
z (t) = x (t) +
= 1,..., n .
We say that Z is an n-dimensional {~,~t}-conformal martingale (local conformal martingale) if, for every ~ = 1 , . . . , n, Y~(t) and Y~(t) are {~t}-martingales (resp. local martingales) satisfying the following conditions: dX?. dX[ = dYt~. dYt/3, dXt~ . d Y f - - - d X / 3 . d Y t ~, a,/3= 1 , . . . , n . (23) (23) implies, in particular, that dX[_ff,dYt~ = 0, ~ = 1,...,n . (24)
A one-dimensional {~t}-conformal martingale (local conformal martingale) is simply called an {~t}-conformal martingale (resp. local conformal martingale). We complexify the spaces M c, V c, S and the corresponding spaces of stochastic differentials in an obvious way and denote them by the same notations. Then (23) is equivalent to dZ/- dZt~ = 0, ~,/3 = 1 , . . . , n (25)
890
S. Watanabe
so that Z(t) = (Z l(t),..., Zn(t)) is an n-dimensional {~,~t}-conformal martingale (local conformal martingale) if and only if Z ~ is {~t}-martingale (resp. local martingale) for every e and satisfies (25). A typical example of n-dimensional {gt}-conformal martingale is the n-dimensional { ~-t}-complex Brownian motion Z = (Z( t) ) which is defined by Za(t) ~- B2a-l(t) -}- ~-gB2C~(t), 0~= 1,... ,n
where B(t) = (B 1(t), B 2(t),..., B 2~ ~(t),B 2~(t)) is a 2n-dimensional {~t}-Wiener process. By the L~vy characterization theorem of Section 1.3, we have the following characterization theorem for an n-dimensional {~-t}-complex Brownian motion:
THEOREM 5. A C~-valued, continuous {~,~t}-adapted process Z = (Z(t)) is an
n-dimensional {~t}-complex Brownian motion if and only if dZ[EdM, dZ[.dZ~=0, dZ~.d2~=28~./~dt, e, f l = l , . . . , n .
Let f ( x l , . . . , x~,yl,..- ,Yn) be a complex-valued cg2-function on R 2n. By setting z~ = x~ + v/-ZTy~, ~ = 1,... ,n, it can be regarded as a (g2-function f ( z l , . . . ,&) on C ". Introduce the differential operators as usual:
&~-2
~x~ - v / T ] -
'
e~-2
v/-L7
Then It6's formula for an 2n-dimensional continuous semimartingale = (xl(t),... ,xn(0, rl(t),..., ,
given in the form of stochastic differentials as ~=1 ~x~ df(((t)) = Z a f (((t)). dX~ 1 d a2f (((t))- d ~ ~
+ 2~x~-7(((t)). + ~(~(t))
ak
(dX?. dYf)
- (dYt~- dYf)
can be rewritten in the complex form as follows: Setting Z(t) = (Z 1(t),..., Zn(t)) where Z"(t) = X ~ + x/TTf~(t),
891
df(Z(t)) = Z
cz= ]
~77z~ (Z(t)). dZt + ~ (Z(t)). d27
52f Z (t))" ( d Z / - d Z f ) + ~c~,p~_l (~Z--~( + 2~z-~

+ ~
eV
(Z(t)). ( d Z / . dZ~)
~2f
(~(t)) (dZ~. di~t ~)
(26)
If Z(t) = (Z 1( t ) , . . . , Z n (t)) is an n-dimensional {o~t}-local conformal martingale, then by (25),
( S f (Z(t)). dZ~ + ~ (Z(t))" d Z t ) df(Z(t) ) = + ~ \Sz~ + (Z(t)). (dZ t dZt~) .

(27)
c(,fl=l
Hence, i f f is holomorphic i.e.,
8f/~2~ = 0, then
(28)
~f (Z(t)). dZ t df(Z(t)) = + ~ az~
F r o m this, we can conclude that f(Z(t)) is a local conformal martingale. M o r e generally, i f D is a domain in C ~ and f : D --+ C m is holomorphic, then = (~(t)) defined by
~(t) = f(Z(t A r))

is an m-dimensional {fft}-local conformal martingale provided that Z = (Z(t)) is an n-dimensional {Yt}-local conformal martingale and T is an {Yt}-stopping time such that Z(t A T) E D for all t > 0, a.s..
2.4. Time change

It is important to notice and apply in m a n y problems the fact that the notion of semimartingales is invariant under a class o f time changes. Let A E A c, i.e., A = (A(t)) is an { ~ t } - a d a p t e d continuous process such that A(0) - 0 and t ~ A(t) is increasing, a.s.. Assume further that t ~ A(t) is strictly increasing and limtTooA(t) = oo, a.s.. Let u ~ A~-1 := min{tlA(t ) > u} be the inverse function of t ~ A(t). Then, for each fixed u _> 0, A~-1 is an {~,~t}-stopping time. Define a new filtration {Yt} by ~,~t = ~A;,, t > > _ O. The spaces S, M c, V c with respect to this filtration are denoted, respectively, by S, NI c, ~c.
892
S. Watanabe
For an {Yt}-progressively measurable (-predictable, -optional) process X = (X(t)), define a new process X A = (XA(t)) by X A ( t ) = X ( A [ 1 ) , t >_ O. Then, X A = (xA(t)) is {~t}-progressively measurable (resp.-predictable,-optional). We call X A the time change of X determined by A. Essentially by Doob's optional sampling theorem, we have that the mapping X ~ X A defines a bijection between the spaces S and S, which, restricted, defines a bijection between M c and lVl~, and a bijection between V C and 9c. In particular, we have MxA = ( M x ) A and VxA = (Vx) A for any semimartingale X ~ S . Furthermore, it holds that { M A , N A} = ( ( M , N } ) A for every M , N C M C. An invariance of the stochastic integrals under the time change can stated as follows: If ~ E 52,1oc(M) then Vt E ~2,1oc(MA) and we have
If, in particular, M E M c, t ~ (M)(t) is strictly increasing and limtTo~(M)(t) = e~, a.s., then we can take A := {M} to make a time change. Then we have := M (M) = M((M}t 1) E 1VI c and (M)(t) = { M } ( ( M } t 1) = t . Hence ~t is an {~t}-Wiener process by L6vy's characterization theorem. In this way, we have the following representation of M by means of an {Wt}-Wiener process B := ~r;
M(t) = B((M)(t)) .
Such a representation of a continuous martingale by a Wiener process is valid in a more general situation. Namely, we have the following fact (cf. Ikeda-Watanabe, 1988): THEOREM 6. If X = (X(t)) is such that M = (M(t)), defined by M ( t ) = X(t)-X(O), is in M c, then there exists a one dimensional Wiener process B = (B(t)) with B(0) = X(0) such that
x(t)
=B((M)(t)),
t_> 0 .
This representation theorem for continuous local martingales can be further extended to the multidimensional case. Indeed, we have the following important result due to F. Knight: THEOREM 7. L e t X i = (Xi(t)) C S, i = 1 , . . . ,d, be given and satisfy the following condition;
M i ( t ) := x i ( t ) - x i ( o ) E M c and ( M i , M J ) ( t ) = O, i,j= l,. . . , d . B(t) =
Then there exists a d-dimensional Wiener process B = (B(t)), (B l ( t ) , . . . , Bd(t)), with B(0) = X(0) = (X 1(0),... ,Xd(0)) such that
It6"s stochastic calculus and its applications x i ( t ) =Bi((Mi)(t)),
893
t >_ O, i = 1,... ,d .
Next we consider the time change for complex martingales. It is easy to see that the class of conformal martingales are invariant under the time change: THEOREM 8. I f Z ( t ) = ( z l ( t ) , . . . ,Zn(t)) is an n-dimensional {~t}-local conformal martingale. Then ZA(t)= (ZI(A~-I),...,Zn(A2I)) is an n-dimensional {~t}-local conformal martingale. Also, there are similar representation theorems for conformal martingales by complex Brownian motions. We state a result in the case of the complex dimension 1 only; we can extend the Knight theorem in the context of multidimensional conformal martingales, though. THEOREM 9. Let Z = (Z(t) = X(t) + ~ L 1 Y ( t ) ) be a C-valued local conformal martingale, i.e., dZ E dM and d Z . dZ = 0, dYEdM, dX.dX=dY.dY and d X - d Y = 0 . Then there Brownian motion z(t) = x(t) + ~ f y ( t ) , equivalently, a planar (x(t),y(t)), such that continuous { ~ t } equivalently, dX, exists a complex Brownian motion
z(t) = z((Z)(t)),
where we set
t _> 0 ,
(z)(t)=2
dZ,.d2,
(/0'
=
dX~-dX~,=
f0'
d~.d~
As a typical application, we note that if z(t) is a complex Brownian motion and f : C ~ C is holomorphic, then f(z(t)) is a local conformal martingale. By It6's formula, we see (f(z(t))} = fo ]~f/Sz]Z(z(s)) ds" Hence we can find another complex Brownian motion ((t) such that
f(z(t)) = (
(z(s))ds
t> 0 .
(29)
This result can be localized for a holomorphic function f in a domain D c C as
f ( z ( t A r)) = ~
(4s))ds
(30)
for a stopping time T such that z(t A T) ~ D, t > O. The representation formula (29) or (30) is useful in the probabilistic approach to problems in complex analysis; for example, B. Davis applied it to prove Picard's theorem (Davis, 1975).
2.5. The exponentials and iterated integrals of semimartingales

Given X -- (X(t)) E S and an ~ 0 - m e a s u r a b l e real random variable t/, there exists a unique Y = (Y(t)) E S which satisfies
894
S. Watanabe
dY(t) = Y(t).dX(t),
Y(0) = t/
(31)
and this Y = (Y(t)) is given explicitly by Y(t) = r/. e x p ( (X(t)- X ( O ) ) _ 1 f t ( d X . d X ) ( s ) } = q.exp
(Mx(t) + Vx(t)) - 5 {Mx)(t)
(32)
where X(t) = X(O) + Mx (t) + Vx (t) is the canonical decomposition ofX. This fact can be shown by an easy application of ItS's formula. If, on the other hand, the Eq. (31) is given in the Stratonovich form as dY(t) = Y(t) odX(t), r(0) = ~ , (33)
then the unique solution is given simply by
Y(t) = ~. exp{X(t) - x ( 0 ) }
(34)
I f M = (M(t)) E M e, in particular, we have exp M(t) -~(M)(t)
} - 1E
for allt_>0 .
because X(t):--exp{M(t)- (M)(t)/2} satisfies d X ( t ) = X ( t ) , dM(t) dM. Thus exp{M(t) - (M)(t)/2} is a positive local martingale and hence satisfies E [exp { M(t)-~IM)(t) 1 }1 _<1 It is a true martingale, equivalently, [ { 1 }] E exp M(t)-~IM)(t) =1 for allt_>0
if the following condition, known as Novikov's condition, is satisfied;
E[exp{lIM}(t)}J <oc
forallt>_0.
(35)
Hence if, in particular, (M)(t) <_ C. t for all t 2 0 a.s., for some constant C > 0, then exp{M(t) - (M}(t)/2} is a true martingale. Given a continuous semimartingale X E S, define X~ E S, n = 1,2,..., by
X 1 (t) ~- X ( t ) - X ( O ) ,
x2(t) =
jo
Xl(~)dX(s),...,x,(O
jo
x,
l(~)dX(s),...
X,(t) is called the n-fold iterated integral of X. If dX dM, then Xn ~ M c for

every n.
I t d ' s s t o c h a s t i c calculus a n d its applications
895
Let
Hn[t,x]=(-t)
nexp ~
~5;x, exp
xER,
t>0,
n=0,1,...
be the Hermite polynomials so that
Ho It, x] = 1, H1 [t, x] = x, H2 [t, x] = x 2
- - t,
[-I3 [t, x I = x 3 - - 3xt,...
Let X ( t ) = X(O) + m x ( t ) + Vx(t) be the canonical decomposition of X. Then we have the following explicit formulas for iterated integrals:
X,(t)=~Hn[(Mx)(t),X(t).
-X(0)],
t>_0, n = l , Z , . . . .
(36)
These formulas are obvious, at least formally, from the relation j~2 Y).(t) := exp [ 2 ( X ( t ) - X ( 0 ) ) - ~ ( M x ) ( t ) l
e~ ~n
n=0 co
H, [(Mx)(t), X ( t ) -
x(0)]
= Z
n--O
nxo(t)
This relation is a consequence of a formula for Hermite polynomials:
and the fact we saw above that Y~o(t)is the unique solution of the equation dYA(t) = )~Y,~(t) . dX(t),
Y,z(O) = 1 .
2.6. Change of drift ( M a r u y a m ~ G i r s a n o v
transformation)
Let (f~,Y,P) and { Y t ) be as above. Let M E M c and set
DM(t) = exp i.M(t ) - ~ (M)(t) }
(37)
Then, as we saw above, DM(t) is a positive local martingale and we assume that it is a true martingale. For this, it is sufficient to assume the Novikov condition (35) as we saw above. So we have E[DM(t)] = 1 for all t _> 0. Then there exists a unique probability measure/5 on (t2, @) (when (O, ~ ) is a nice measurable space and ~- = Vt>0 ~ t , which we can assume without loss of generality) such that/5(A) = E[DM(t-) 1A] for all A E ~-t, t _> 0.
896
s. Watanabe
THEOREM 10. Let X be a continuous semimartingale, i.e., X E S, with the canonical decomposition
X(t) = X(O) + Mx(t) + Vx(t) .
(38)
On the probability space (g2, Y,/3) with the same filtration {@t}, X is still a continuous {Yt}-semimartingale but its canonical decomposition is given by
x ( t ) = x ( o ) + ;4x(t) + ; x ( t ) ,
with
(39)
f/ix(t) = Mx(t) - (Mx,M)(t)
and
x(t) = vx(t) + (Mx,M)(t)
. (40)
Furthermore, it holds that
(~lx,A4y) = (Mx,My),
X, Y E S .
(41)
This result is known as Girsanov's theorem. The transformation of the probability measures given above is called a transformation of drift or a MaruyamaGirsanov transformation since it produces a change of drift part in the semimartingale decomposition as given by (40). In the above, we started with a given density DM and then constructed a probability/3 which is equivalent to P in the sense that P a n d / 3 are mutually absolutely continuous if they are restricted to Y t for every t > 0. Conversely, we can show the following: THEOREM 11. Assume that M2,1oc = M c, that is, every {~,~t}-martingale is a continuous martingale. Then, if P is a probability on (O, Y ) which is equivalent to P in the above sense, then there exists a unique M E M C such that the density d['/dP, restricted to @t, is given by DM(t) in (37) for every t _> 0.
2.7. Filtrations and martingales

Let (O, Y , P) and { ~ t } be as above. We assume that the filtration is separable in the sense that the Hilbert space L2 (Q, Vt>0 Yt, P) formed of all square-integrable Vt>0 ~t-measurable real random variables is separable. In order to know the filtration {~,~t}, it is essential to know the space of martingales M2,1oc or Me. Indeed, it is an important and interesting problem to study the structure of these spaces of martingales. Here is an important general result due to M.H. Davis and P. Varaiya (Davis-Varaiya, 1974): THEOREM 12. There exist M1,M2,... E M2 such that
(Mi,mj) = 0
ifiCj,
(M1} >> (m2) >> "'"
and every M E M2 can be represented as a sum of stochastic integrals for some N-predictable processes q~i E ~2(Mi):
897
M= fe dMi.
If N1, N2,.. is another such sequence, then
(M1) ~ (N1), (M25 ~ (N25,...
In particular, the number of such M1, M2,. is an invariant of the filtration {~t} which we call the multiplicity of {Jr}. Here, >> and ~ denote the co-wise absolutely continuity and the equivalence, holding almost all co, of the Stieltjes measures associated with increasing processes, respectively. We give two important cases in which we can completely describe the structure of the spaces M2,1oc and M2.
(1) The Brownian filtration

Let X(t) = ( x l ( t ) , . . . ,Xd(t)) be a d-dimensional Wiener process defined on a complete probability space (f2, ~ , P) and let { ~ x } be the natural filtration of X; J~t x for t _> 0 is the smallest a-field on ~2 containing all P-null sets and with respect to which all X(s), s <_ t are measurable. This filtration is always right-continuous so that it satisfies the usual conditions. The spaces M2,1oc and M2 of Section 1.2 with respect to the filtration ~-~ are denoted by M2,1oc({Jtx}) and Mz({~tx}), respectively. Set
Bi(t) = xi(t) - xi(o),

so that B i
i= 1,...,d
(42)
(Bi(t)) E Mz({~-x}).
THEOREM 13. (Kunita-Watanabe, 1967, Dellacherie, 1974) Let M E M2({~-x}) (M2,1oc({~x})). Then, there exists an {~tX}-predictable process ~i with E[fol~i(s)12ds] < oo for all t > 0 (resp. f~l~i(s)12ds < oc for all t_> 0, a.s.), i = 1 , . . . , d, such that
t
M ( t)
q~i(s)dB i (S) .
(43)
That is, every martingale with respect to the natural filtration {Y~} of X can be represented as a sum of stochastic integrals by the basic martingales {B i} of (42).
COROLLARY 1.
M2({~-x}) = M~({~,~x})
and
M2,1oc({~x})
MC({~tx}) .
From this, we can see that every (not necessarily square-integrable) martingale (M(t)) with respect to the filtration { ~ x } such that M ( 0 ) = 0 is in MC({~x}) and hence has the representation (43). This is because M can be approximated, compact uniformly in t in probability, by bounded martingales
898
S. Watanabe
which are necessarily continuous and hence M is continuous. Then note that any continuous (local) martingale is always locally square-integrable. COROLLARY 2. Let N > 0 and F be an ~-X-measurable and integrable real random variable. Then, there exists {ytX}-predictable q~i with fN i~i(s)12ds < oc, a.s., i = 1 , . . . , d , such that F = E(FI@ x) +
S=I 0
~i(s)dBS(s) .
(44)
E[J N I~i(s)12ds] < oc for a l l / = 1 , . . . , d if and only if F - E ( F I ~ -x) is squareintegrable.
(2) The Poissonian filtration

Let p be a a-finite Poisson point process on a complete probability space (Q, ~,~) with values in (E, Nz) and, for each t _> 0, ~ P be the a-field on ~ generated by P-null sets and random variables Np((O, s] x U), s < t and U E -~z. THEOREM 14. (Dellacherie, 1974, Ikeda-Watanabe, 1989) Every M E M2({~P}) (Mz,loc({~P})) can be represented in the form
M(t) = ['+ [
/ 0 ,]~
f(,,
(45)
2.1oc for s o m e f E F p (2{ ~p,}),(resp. Fp. ({ ~--p t})-
2.8. Applications to mathematical finance

Consider a simple financial market in which two kinds of assets, a riskless cash bond and a risky stock, are traded. As a mathematical model, we consider a probability space (f2, Y , P) with a filtration {Yt}t>_0 satisfying the usual conditions and denote the price per unit of the cash bond and the stock, at time t, by B(t) and S(t), respectively. (B(t)) and (S(t)) are {o~t)-adapted stochastic processes. In the following, we only consider the time t in the finite interval E 0, T] where T is a positive constant called the maturity. We consider a model introduced by F. Black and M. Scholes (Black-Scholes, 1973), now well-known as Black-Scholes model: B(t) and S(t) are given by
B(t)=exp(rt),
S(t)=So.exp(aW(t)+
[#-a---~]t) .
(46)
Here, W(t) is a one-dimensional {J~}-Wiener process with W(0) = 0, and r _> 0, a > 0 and # c R are constants called, riskless interest rate, the stock volatility and the stock drift, respectively. So is a positive constant. B(t) and S(t) are uniquely determined by the following equations on stochastic differentials: dB(t) = rB(t)dt, B(0) = 1 (47) dS(t) = ~S(t)dW(t) #S(t)dt,
S(O) = So
(48)
899
Indeed, it is obvious for B(t). (48) is the case of Eq. (31) of Section 2.5 with X(t) = a W ( t ) + # t and, since d X ( t ) - d X ( t ) = o-2dt, the solution Y(t) in (32) coincides with S(t) in (46). In the Black-Scholes model, we assume always that the filtration { ~ t } is the natural filtration of W = (W(t)). By a strategy or portfolio, we mean a pair ~z = (~b, 0) of {~-t}-predictable processes q~ = (~b(t)) and ~, = (0(t)) such that s)[2ds +
/0
10(s)[ds < ec, a . s . .
~b(t) and 0(t) denote the a m o u n t (i.e., the number of units) of the stock and the cash bond we hold at time t, respectively. They may take negative values so that we allow unlimited short selling of the stock and cash bond. The value V(t) = V~(t) of the portfolio ~ -- (~b, 0) at time t is given, therefore, by
v(t) = 4)(t)s(t) + O(t)B(t)

We identify two portfolios rc = (~b,) and ~ ' = (~b', 0') and write ~ = ~' if f0 qS(s)ds = J~ qT(s)ds and f0 O(s)ds = f~ O'(s)ds for all t E [0, T], a.s.. A portfolio 7c = (~b, 0) is called self-financing if it satisfies V~ c S and dV~(t) = d?(t)dS(t) + O(s)dB(t) . (49)
An intuitive meaning of a self-financing portfolio is that t h e c h a n g e in its value depends only on the change of asset prices. Let S ( t ) = e rtS(t) and V~(t) = e-rtV~(t) so that S(t) and V~(t) are the discounted stock price and the discounted value of the portfolio at time t, respectively_It is easy to see by It6's formula that n = (~b, 0) is self-financing if and only if V~ c S and dV~(t) = ~b(t)dS(t) . (50)
L e t X be a nonnegative ~ r - m e a s u r a b l e r a n d o m variable. We call it apayoffor a claim; we consider it as a payoff or a claim which we can receive at time T of maturity. F o r example, consider a contract that, at maturity T, we have a right (but not an obligation) to buy a unit of the stock by paying a predetermined price K. K is called the strike price or the exercise price. The payoff of this contract is obviously (S(T) - K)+ = (S(T) - K) V 0. This contract is called an European call option. F o r a given p a y o f f X , we say that a portfolio ~ = (~, 0) replicates X or rc is a replicating strategy for X if it satisfies the following: (i) ~ is self-financing,
(ii) V~(t) >_ O, t E [0, T], a.s.,

(iii) V~(T) = X, a.s.. If a replicating strategy for X exists, we say that the payoff X is replicable. By Theorems 10 and 11 of Section 2.6, we can see that there exists a unique probability measure P* on ( ~ , Y r ) such that it is equivalent to P and S(t) = e ~tS(t) is {~t}-martingale under P*. It is given explicitly by
900
S. Watanabe
on
THEOREM 15. Let X be a payoff and assume that it is integrable under P*. Then it is replicable and a replicating strategy 7c = (q~,$) f o r X is unique. Its value V=(t) at time t is given by V~(t) =E*[e r(r-t)"Xlgt], in particular, V~(0)=E*[e rr.X] . (52)
V,(t) in this theorem is called the arbitrage price at time t of the payoff X at maturity and V~(0) arbitrage price of the p a y o f f X at maturity. The reason is as follows: If we make a contract to receive the payoffX at maturity T and if we buy this contract by paying price v at time 0, then, if v > V~(0), the seller of this contract uses the replicating strategy ~ for X with the initial investment V~(0) to attain X and thus obtain a gain v - V~(0). If v < V~(0), then the buyer of this contract makes a short selling of the replicating portfolio for X thus gets a gain V~(0) - v at time 0 and clear the short selling at maturity T by the p a y o f f X he receives. In either case, there is an opportunity of riskless gain (an arbitrage opportunity) for the seller or buyer. Thus, only possible price we pay at time 0 to buy the contract without arbitrage opportunity is V~(0). In particular, for the European call option with the exercise price K, the payoff is (S(T) - K ) + , and hence, the arbitrage price of the option given by (52) can be computed explicitly from (46) and (51); it is given by
E*[e
(s(r)-K)+l =
'r(S0)
(s3)
and, more generally, the aritrage price of the option at time t is given by E*[e (r '). (S(T) - K)+l~a ] = '/*(r_,)(S(t)) . Here, the function ~o(x), 0 > O,x > O, is given by (54)
~Po(x) = xN(dl) - Ke-rN(d2) ,

where dl=
(55)
log(x/K) + (r + a2/2)0
av/0
d2
d l - a~/-0,
1 fa -t2/2 N(d) = --x/2~J~ e dt
These formulas are known as the Black-Seholes option pricing formula. As for the proof of Theorem 15, we remark the following; if ~ is a replicating strategy for X, then (52) is obvious because e-rtV~(t) is a martingale under P*. Conversely, for an integrable payoff X, the existence of a replicating strategy is
901
essentially a consequence of Corollary 2 to Theorem 13, cf. Lamberton-Lapeyre, 1996 and Baxter-Rennie, 1996, for details.
3. Stochastic differential equations
Stochastic differential equations (SDE's) were first rigorously formulated by K. It6 in 1942, (cf. the paper [2] in K. It6, Selected Papers, 1987), to construct diffusion processes corresponding to Kolmogorov's differential equations. Given a time-dependent differential operator
1 d 52 d
Lt = ~ ~ = l a i ' ( t ' x ) ~ +
i~=lbi(t'x)~ixi
on R d where aiJ(t,x) is symmetric and nonnegative definite, the problem is to construct and analyze a probability model associated to this operator: The model is a diffusion process (a Markov process with continuous paths) whose system of transition probability densities is given by the fundamental solution to the heat equation ~St = L t. A more convenient probabilistic formulation of the model is given by D.W. Stroock and S.R.S. Varadhan (Stroock-Varadhan, 1979); it is an Rd-valued continuous {~t}-adapted process X = (X(t)) defined on a probability space with a filtration {Yt} such that, for every smooth function f(t,x) on
[0, oc) R d, f(t,X(t)) - f ( 0 , X ( 0 ) ) = M(t) + V(t)

where M = (M(t)) C M c and V = (V(t)) E V c such that t 5
This is equivalent to saying that X = (X(t)),X(t) = (X 1(t),... ,Xd(t)), satisfies
X i = (Xi(t)) c S, i.e., a continuous semimartingale, dX] . dX/ = aiJ (t,X( t) )dt, dVx~(t)=bi(t,X(t))dt,
i r
i,j= 1,...,d .
If (ak(t , x))k= 1 is any system of functions satisfying

F
Z
k 1
i %(t,x)~(t,x) = aiJ(t,x) ,
then such X can be given in the form
dXi(t) = ~
k--1
aik(t,X(t))dWk(t) + bi(t,X(t))dt,
i = 1,...,d ,
902
S. Watanabe
where W(t)= (l/vl(t),..., Wr(t)) is an r-dimensional Wiener martingale. The equation we thus found for the stochastic differentials of the process X is exactly what we call It6's SDE. This equation is said to be of M a r k o v i a n type because the coefficients aik(t,X(t)) and bi(t,X(t)) depend only on the present position X(t) of the system X = (X(t)).
3.1. SDE's based on Gaussian white noise (Wiener noise): The basic definitions
We start by giving a general formulation of SDE's in which the coefficients may depend on the past history of the system. Let W d be the space of all d-dimensional continuous paths: W d = g(I0, oc) ~ R d) endowed with the topology of the uniform convergence on every finite interval (which is a Polish topology) and ~ ( W d) be the Borel a-field (i.e., the topological a-field). F o r each t _> 0, define Pt : We -+ We by (p,w)(s) = w(t A s), s > o
and let .~t(W e) = p;-1 (~(Wd)). Generally, we say that a function F = (F(t, w)) defined on [0, oo) x W e with values in a Polish space S endowed with the Borel a-field 5~, is called non-anticipating if, for each t >_ 0, the map W e ~ w Y(t, w) E S is ~t(We)/5~-measurable. Let A e'r be the totality of functions ~(t,w)= (c~(t,w)): [0, o o ) x W d - - + R e Rr(;= the totality of d x r real matrices) which are ~([0, o c ) ) ~ ( W e ) measurable and non-anticipating. An important case of ~ E A e'r is when it is given as co(t, w) = a(t, w(t)) by a Borel function a(t,x) : [0, ec) x R d ~ R d @ R ~. In this case, ~ is said to be independent of the past history or of Markovian type. F o r a given c E A d'~ and/3 C A d'~, we consider the following SDE:
F
dXi(t) = ~ i k ( t , X ) d W k ( t ) + fli(t,X)dt,
k--1
i= 1,...,d
(56)
or denote it simply in the matrix notation as
dX(t) = ~(t,X)dW(t) + fl(t,X)dt .
(56)'
Here, X = (X(t)),X(t)= ( x l ( t ) , . . . ,Xe(t)), is a continuous Re-valued process and W = (W(t)), W(t) = ( w l ( t ) , . . . , W~(t)), is an r-dimensional Wiener process with W(0) = 0. A precise formulation is as follows: We say that X = (X(t)) is a solution to the SDE (56) if there exists a probability space (~, ~ , P) with a filtration {~-t} such that (i) x i = (xi(t)) E S for i = l , . . . , d , (ii) W = (W(t)) is an r-dimensional {~-t}-Wiener martingale, (iii) ~(t,X) E ~2,ioo((Wk(t))) and ~(t,X) E 51,1oo for i = 1 , . . . ,d, k = 1 , . . . ,r, where 2'1,1oc is the totality of {~-t}-predictable processes f = 0r(t)) satisfying fo [f(s)lds < oo for all t _> 0, a.s., (iv) the stochastic differentials dXi(t),i = 1,...,d, satisfy the relation (56), or equivalently,
903
X(t) - X(O) =
holds, a.s..
f0 t ~(s,X)dW(s) + /0 t fi(s,X)ds,
t >_ 0 ,
Thus, a solution X is always accompanied by a Wiener process W. To emphasize this, we often call X a solution with the Brownian motion W or simply call the pair (X, W) itself a solution of (56). Since W is an {Yt}-Wiener process and X is {~'t}-adapted, we see that, for each t > 0, the family of r a n d o m variables W(u) - W(v),u > v > t, and the family of r a n d o m variables (X(s), W(s)),s <_ t, are mutually independent. Conversely, if this independence property holds, then, setting { ~ t } to be the natural filtration of the pair (X(t), W(t)), W = (W(t)) is an {~,~t}-Wiener process. When c~ and /? are of Markovian type; c~(t,w)=a(t,w(t)) and /?(t, w) = b(t, w(t) ), the equation
dX ( t) = ~( t , X (t) )dW( t) + b( t , X ( t) )dt
(57)
is called a SDE of Markovian type. Furthermore, if cr(t,x) and b(t,x) are independent of t, i.e., a(t,x) = a(x) and b(t,x) = b(x), the equation
dX(t) = a(X(t))dW(t) + b(X(t))dt
(58)
is called a SDE of time-independent (or time-homogeneous) Markovian type. Note that the time-dependent Eq. (57) can be made time-independent by adding one more component X(t) = t to the solution X(t); we can regard )?(t) = (t,X(t)) as a solution of time-independent SDE with coefficients #(X(t)) = cr(t,X(t)) and b(X(t)) = b(t,X(t)). Note also that when the coefficient o-(x) is sufficiently smooth, the Eq. (58) can be transformed into an equivalent equation in the form of Stratonovich differentials:
dX(t) = a(X(t) ) o dW(t) + b(X(t) )dt

where
r i
(58)'
1 J=, ~7"~a~(x) ~ ( x ) , g'(x) = b*(x) - 5
i= 1,...,d
(59)
Next, we define the notion of uniqueness of solutions. There are two kinds of uniqueness; uniqueness in the sense of law (uniqueness in law) and pathwise uniqueness. When we consider a SDE as a tool to construct a stochastic process, the uniqueness in law is sufficient. I f we consider a SDE as a machine which produces the solution as a output when we input a Wiener process, in other words, if we would like to obtain the solution as a functional of the Wiener process, the notion of pathwise uniqueness is more natural and more important; As we shall see, this notion is closely connected with the notion of strong solutions. Here are basic definitions. For a solution X = (X(t)) of SDE (56), we call X(0) the initial value and its law on R d the initial law or the initial distribution. The law on
904
s. Watanabe
W a of X is called the law of the solutionX. We say that the uniqueness in the sense of law of solutions holds if the law of a solution is uniquely determined by its initial law; that is, i f X a n d X 1are any two solutions of (56) whose initial laws coincide, then the laws of X and X I coincide. This definition is equivalent to a seemingly less restrictive definition in which we restrict ourselves to solutions whose initial values are constants, i.e., the initial laws are 6-distributions at some points in R a. Next, we say that the pathwise uniqueness of solutions holds if, whenever X and X ~ are solutions of (56) defined on a same probability space (f~, Y , P) with a same filtration { ~ t } and with a same {~t}-Wiener process W = (W(t)) such that X(0) = X'(0), a.s., then it holds that X(t) = X'(t) for all t _> 0, a.s. In this case, also, the definition is equivalent to that in which we restrict solutions to those having n o n r a n d o m initial values. A solution X = (X(t)) of SDE (56) is called a strong solution i f X is adapted to the natural filtration {~-~} of the accompanying Wiener process W = (W(t)). For a strong solution X, X(0) must be a constant a.s., because of the Blumenthal 0-1 law for the Wiener process. We give a more refined notion of strong solutions as follows. We say that SDE (56) has a unique strong solution if there exists a function
F : R d x W'~ 9 ( x , w ) ~ F(x,w) E W d ,
where W~) = {w C Wr]w(0) = 0}, with the following properties: (i) F is Borel-measurable and, for each fixed t >_ 0 and x c R d, the m a p W~o ~ w ~ F(x, w)(t) E R g is ~t(W~)/~(Rd)-measurable. [~t(W~) = (ii) For any Ra-valued r a n d o m variable X(0) and an r-dimensional Wiener process W = (W(t)) with W ( 0 ) = 0 which are mutually independent, the continuous d-dimensional process X = (X(t)) defined by X = F(X(O), W) is a solution of SDE (56) with the Brownian motion W. (iii) Conversely, for any solution (X,W) of SDE (56), it holds that X = F(X(O), W) a.s.. I f a unique strong solution F(x, w) exists, then X = F(x, w) itself is a strong solution of SDE (56) realized on the r-dimensional Wiener space (W~,N(W~),pW) (pW: the r-dimensional standard Wiener measure) with the initial value x and with the canonical Wiener process w = (w(t)) E W~). It is obvious that the existence of a unique strong solution implies the pathwise uniqueness of solutions. Its converse is also true. Namely, we have the following general facts due to T. Y a m a d a and S. Watanabe and refined by O. Kallenberg, (cf. Ikeda-Watanabe, 1989 and Kallenberg, 1996): THEOREM 16. (1) The pathwise uniqueness of solutions implies the uniqueness in law of solutions. (2) If the pathwise uniqueness for (56) holds and if a solution of (56) exists for any given initial law, then there exists a unique strong solution for (56).
Itd's stochastic calculus and its applications 3.2. Existence and uniqueness results for solutions
905
The first result for the existence and the uniqueness of solutions for SDE's was obtained by K. It6 in the case of SDE's of Markovian type under the condition of Lipschitz continuity of coefficients (cf. papers [2] and [12] in K. It6, Selected Papers, 1987). He applied Picard's method of successive approximation to construct a solution; the solution is pathwise unique and the unique strong solution in the sense of the previous subsection can be directly constructed in this case. In more general cases in which the coefficients are not necessarily Lipschitz continuous, however, such a direct construction of solutions can not be applied and it is more convenient to discuss the existence and the uniqueness of solutions separately and appeal to Theorem 16 above to obtain the unique strong solution. Also, there are several equations which have unique solutions in law but have no strong solutions. The existence of solutions was discussed by A.V. Skorokhod. If the coefficients c~(t, w) and ~(t, w) are bounded and continuous on [0, ec) x W d, then a solution of (56) exists for any given initial law. A standard proof for this fact is to show the tightness on the path space W d of probability laws of approximate solutions obtained by Cauchy's polygonal method. Then the limit process in law can be shown to a solution. The boundedness assumption can be relaxed to, e.g., the condition that, for every T > 0, a constant Kr > 0 exists such that l~(t,w)l + I/~(t,w)l _< f r ( 1 + Ilwl[t), t E [0, T], w c W d (60)
where Ilwllt =max0<~_<tlw(s)l. In the case of Markovian equation (57), it is sufficient to assume that a(t,x) and b(t,x) are continuous and satisfy: Io-(t,x)l + Ib(t,x)l <_Kr(1 + lx[), t E [0, T], x E Rd . (61)
If these conditions are violated, a solution does not exist globally in time, in general, but exists only up to a time e, called the explosion time, at which we have limtTe IX(t)[ -- oc if e < co. We can extend the notion of solutions to include such a case of explosions; we replace the path space W d by the space ~d which consists of all continuous paths w [0, co) ~ R e U {A} (the one-point compactification of R d) such that w(t) = A for t >_ e(w) := inf{tiw(t) = A}. N.V. Krylov discussed a class of SDE's of Markovian type with not necessarily continuous but Lesbegue measurable coefficients. Such equations are necessary in stochastic control problems (cf. Krylov, 1980). Now we consider the conditions for the uniqueness of solutions. A well-known sufficient condition for the pathwise uniqueness is the local Lipschitz condition. In the general non-Markovian equation (56), it is formulated as follows: Set W(T) = {w E Wdli[w]lr _< T}. For each T > 0, there exists a constant Kr > 0 such that [c~(t,w) -c~(t, w')[ + [/?(t, w) - / ? ( t , w')l
<_Kr(llw-w'llt),
t~[O,T], w,w' C W ( T ) .
(62)
906
S. Watanabe
In the case of Markovian equation, it is sufficient to assume that

+
Ib(t,x) - b(t,z')L
<_Kr(Ix-x'l),
where B(T) = {x E Rdllxl < r}.
x,x' E B ( T )
(63)
THEOREM 17. Under the Lipschitz conditions (62) and (63), the pathwise uniqueness of solutions holds for Eqs. (56) and (57), respectively. If we assume further the conditions (60) and (61), then the unique strong solutions can be constructed directly by Picard's successive approximation: If we define, for each x, n -- 0, 1 , 2 , . . . and a given Wiener process W = (W(t)) with W(0) = 0, a sequence of d-dimensional continuous processes Xn = (Xn(t)), each adapted to the natural filtration of W, successively by
Xo(t) = x,
Xn(t) = x +
/0
~(s,X~ 1)dW(s) +
/0
fl(s,Xn_l)ds
then Xn converges uniformly in t E [0, T] for every T > 0, a.s. as n --+ oc, to X = (X(t)) which is a strong solution of (56) with the initial value x. If this construction is carried out on the r-dimensional Wiener space (W~, N(Wf)),P w) with respect to the canonical Wiener process w = (w(t)), then X = F(x, w) is the unique strong solution of (56). Consider the case d = 1 and the equation of the Markovian type (57). THEOREM 18. (Yamada-Watanabe, cf. Ikeda-Watanabe, 1989) Suppose that a is H61der-continuous of order 1/2 and b is Lipschitz continuous, i.e., for every T > 0, there exists a constant Kr > 0 such that
I (t,x) -
2 + Ib(t,x) - b(t,x')l
<_Kr(Ix-x'l),
tc[O,T],
x,x' E B ( T )
(64)
Then the pathwise uniqueness of solutions holds. Hence, if furthermore the condition (61) is satisfied, then a unique strong solution exists for (57). In this theorem, we may replace the drift coefficient b(t,x) by a non-Markovian drift coefficient //(t, w); if fl(t, w) satisfies the same Lipschitz condition and the growth condition as (62) and (60), respectively, we have the same conclusion as Theorem for the equation
dX(t) = a ( t , X ( t ) ) d W ( t ) +/3(t,X)dt .
Typical examples of such a are, a(t,x) = v/x V O, a(t,x) = ~/x(1 - x) - l{0_<x_<a},etc. THEOREM 19. (Stroock-Varadhan, 1979) Consider a Markovian SDE (57) and assume that coefficients a(t,x) and b(t,x) are continuous, and that the d x dmatrix
907
k=l
)
(65)
is uniformly positive definite. Then the uniqueness in law of solutions holds. However, the pathwise uniqueness does not hold, in general; a counter example was given by M. Barlow (Barlow, 1982). The change of drift (Maruyama-Girsanov transformation) of Section 2.6 can be used to show the existence and the uniqueness in law of SDE's in the following form dX(t) = dW(t) + fl(t,X)dt , i.e., the case d = r, a(t, w) = I: d x d identity matrix. THEOREM 20. Assume that fl(t,w) C A d'l is bounded. Then a solution of (65) exists for any given initial distribution and the uniqueness in the sense of law holds. This theorem can be shown in the following way: Given a d-dimensional distribution #, we set up, on a suitable probability space (g2,f f , P ) with a filtration {fit} such that ~- = Vt>0Yt, a d-dimensional {Yt}-Wiener process X = (X(t)) with the initial distribution #. Set
~V(t) = X(t) - X(O)
and
M(t) = exp If0 t fi(s,X), d ~ - ( s ) - 1.f0t ]fi(s,X)[ 2 ds] . As we saw in Section 2.5, M(t) is an {~t}-martingale and we can obtain a new probability/5 on ((2, ~ ) by the M a r u y a m ~ G i r s a n o v transformation. Then, on the probability space (~2,~,/5), we see by Theorem 10 that W ( t ) = VV(t)- IV~, f fi(s,X)dVg(s))(t)
= ~V(t) -
fi(s,X)ds = X(t) - X(O) -
/0
fi(s,X)ds
is {fft}-Wiener martingale so that (X, W) is a solution to Eq. (65). Any solution can be obtained in this way and hence, the uniqueness in law holds. More generally, we can deduce by the same method the existence and the uniqueness in law of the equation
dX(t) - o(t,X)dW(t) + [fi(t,X) + c(t,X)7(t,X)]dt

for c(t,w)E A a'r, f l ( t , w ) c A d'l and bounded 7(t,w)C A r'l, if we can show, beforehand, the the existence and uniqueness in law of the equation
dX(t) = e(t,X)dW(t) + fi(t,X)dt .
908
S. Watanabe
Cf. Ikeda-Watanabe, 1989. The pathwise uniqueness of solutions for the Eq. (65) does not hold in general: Indeed, B. Tsirel'son gave such an example of fi(t, w):
fi(t,w) =
0, O(W(,,+l)-W(,i+2)~ \ ti+l-ti+2 /~
t_>t0 a n d t = 0 t e [t/+l ti), i = 0 , 1 , 2 , , "'"
where {tn} is a sequence such that 0 < . . . < t n < t , 1 <'"<t0=l and lim,~o~ tn = 0, and O(x) = x - Ix] is the decimal part of x c R. The time change of Section 2.4 can be applied to solve a class of SDE's. Let d = 1 and consider an equation of non-Markovian type:
dX(t) = c @ , X ) d W ( t )
(66)
where ~(t,w) satisfies that C1 <_ e(t,w)~(t,w)* = Ic~(t,w)l2 _< C2 for all (t,w) for some constants 0 < C1 < C2. Given a one-dimensional Brownian motion B = (B(t)) with a prescribed initial law # on R, we consider the following equation for a continuous strictly increasing function A = (A(t)) with A(0) = 0:
A(t) =
1 ds, le(A(s),BA)I 2
t> 0
(67)
where B A = (BA(t)) is defined, as in Section 2.5, by BA(t) = B ( A - I ( t ) ) , t H A-l(t) being the inverse function of t ~ A(t). THEOREM 21. (Ikeda-Watanabe, 1989) If, for almost all sample paths of B = (B(t)), there exists one and only one continuous function A = (A(t)) which satisfies the Eq. (67), then X = B A = (BA(t)) is a solution to SDE (66) with a certain Wiener process W. Furthermore, the uniqueness in the sense of law holds for SDE (66). Thus, the problem of the unique existence of solutions for (66) is reduced to the problem of the unique existence of solutions for the Eq. (67) along each sample path of B = (B(t)). For a simple example, if e(t, w) = a(w(t)) by a function a(x) on R, then c~(A(s),B A) = a ( B A ( A ( t ) ) ) = a(B(t)) and hence the Eq. (67) has the unique solution A ( t ) = f~ la(B(s))l-2ds. Consequently, the existence and the uniqueness in law for the SDE
dX(t) = a(X(t) )dW(t)

holds and a solution is given by X ( t ) = B(A -1 (t) ) with A(t) = .1~ la(B(s) )l-Z ds. More generally, if a(t,x) is Lipschitz continuous in t, then the equation
A(t) =
2 ds,
t _> 0
has the unique solution A for every sample path of B and hence the SDE
dX(t) = a ( t , X ( t ) )dW(t)
Itd's stochasticcalculusand its applications
909
has a unique solution in law given by X(t)= B(A-1(t)). This fact was first remarked by M.P. Yershov. A more interesting and nontrivial example is the case of e(t, w) given by
~(t, w) = a y +
w(s))ds
where y E R is a constant, f(x) is a locally bounded Borel function on R and a(x) is a bounded Borel function on R such that a(x) >_c for all x for some constant c > 0. The Eq. (67) is now
t (
A(t) = f a y + \
I'A(s)
Jo
x~-2
ds (B(s))
of B, as
f(BA(u))du)
which can be solved uniquely, for each given sample
A(t)
~ta(y+~-l[foSf(B(u))dul)
2ds foa(Y+z)2dz.
In this
where x--+ ~b-l(x) is the inverse function of x--+ ~b(x)= way, the equation
dX(t) =a[Y+.footf(X(s))dsldW(t)
can be solved uniquely in law and a solution is given by example was given by M. Nisio.
X(t) = B(A -1 (t)).
This
3.3. SDE's based on Gaussian and Poissonian noise

So far we have considered SDE's driven by a Gaussian white noise or, equivalently, by a Wiener process. For such equations, the solutions are necessarily continuous processes. We can consider more general SDE's which are driven by a Poisson random measure as well as a Wiener process. Then the solutions are usually discontinuous processes. Here we formulate a Markovian type SDE only. Let (S, ~ z ) be a measurable space and n(d~) be an infinite and a-finite measure on (S, Nz). We can consider, on a suitable probability space (f2, ~ , P ) with a filtration { ~ t } , an r-dimensional {Yt}-Wiener martingale W = (W(t)) and an {Yt}-Poisson point process p on (ff, Nz) with compensator Np(ds, d ~ ) = ds. n(d~). Then W and p are necessarily mutually independent. We fix ~0 c Nz such that n(S \ So) < oc. Let a(x) = (a~(x)) be a Borel function R a --+ R d R r, b(x) = (bi(x)) be a Borel t function R d --+ R d a n d f ( x , 4) = (f(x, 4)) be a ~ ( R d) ~z-measurable function R d x S---+ R d such that the following growth condition is satisfied for some constant K > 0:
I~(x)12 +
[b(x)12 + Z
o
If(x,
~)12n(d~) _< K(1 + Ix[2),
x E Rd .
(68)
910
S. W a t a n a b e
Consider the following SDE:
xi(t) = x i ( 0 ) +
k=l 0
aik(X(s--))dWk(t) + f
,0'
bi(X(s-))ds
+ .L '+ . f i ( X ( s - ) , ~ ) "
+
ls0(~). Np(ds, d~)
f i ( X ( s - ) , ~ ) " lz\z0(~).Np(ds, d~),
i= 1,...,d .
(69)
Here, as in Section 1.5, Np(ds, d~) = Np(ds, d~) - Np(ds, d~)
= Np(ds, d~) - ds-n(d~) .

By a solution of the Eq. (69), we mean an Rd-valued cfidlAg process X - (X(t)) defined on a probability space (O, Y, P) with a filtration {~-~}, on which there exist an r-dimensional {~t}-Wiener martingale W = (W(t)) and an {~t}-Poisson point process on (E, Nz) with the compensator ds- n(d~), such that X is { J r } adapted and satisfies the Eq. (69). Since f~ IX(s)]2ds < e~ for all t > 0, we see by (68) that (a~(X(t-))) E 52,1oc((Wk(t)) and f i ( X ( s - ) , 4) c Fp 2~1c so that the integrals in the right-hand side of (69) are well-defined as c/tdlfig processes.
a(x), b(x) and f(x, 3) satisfy, besides the growth condition (68), the following Lipschitz condition for some constant K>0:
THEOREM 22. (Ikeda-Watanabe, 1989) If
I~(x) - ~(y)l 2 + Ib(x) - b(y)l 2 +{

o
[f(x,~)-f(y,~)12n(d~) < K l x - Y [ 2,
x,yER d ,
(70)
then, on any probability space (O, ~-, P) with a filtration {t}, on which there are given an r-dimensional {~'t}-Wiener martingale W = (W(t)), an {~t}-Poisson point process on (E, Nz) with the compensator ds-n(d~) and a d-dimensional W0-measurable random variable X0, there exist a unique {~-t}-adapted solution X of SDE (69) with initial value X0.
3.4. SDE's and diffusion processes

Consider a time-independent Markovian type SDE (58) with continuous coefficients on Ra;
dX(t) = a(X(t) )dW(t) + b(X(t) )dt .
(58)
Then a solution admitting an explosion always exists. Assume further that the uniqueness in law of solutions holds for (58). Then, denoting by Px the law on "~(a of a solution X = (X(t)) with X(0) = x, the system {Px} defines a diffusion process
911
(a strong Markov process with continuous paths) on R d with the point at infinity A as a terminal point. Set
aiJ(x) = ~ ak(x)~(x), i "

k--1
i , j = 1,...,d
and define a second-order differential operator A by 1 d ~ ~ d

A = 2~i,j-I aij(x) ~iXi~Xj @ ~ ","= i~ l
bi(x) X,"
t
(71)
with the domain Cg(R d) (the space of all cg2-functions on R d with compact supports). We set always f(A) = 0 for f C Cg(Rd). By It6's formula applied to a solution X of (58), we have for f E C2(Rd),
f(X(t)) -I(X(O)) = Z +
~-5~(X(s))~k(X(s))dW (s)
i=1 k=I dO tz+i
/0
(Af)(X(s))ds .
Hence, f ( w ( t ) ) - f ( x ) fo(Af)(w(s))ds is a martingale under Px for every f c Cg(Rd) and this property characterizes the diffusion process {Px}. The diffusion {Px} is generated by the differential operator A in this sense. Furthermore, if for some 2 > 0, (2 -A)(C~(Rd)) is dense in Coo(Rd): ( = t h e closure of C2 (R a) under the supremum norm), then the transition semigroup of the diffusion is a Feller semigroup (a positive contraction and strongly continuous semigroup) on C~ (R d) and its infinitesimal generator/1 in Hille-Yosida's sense is the closure of (A, C2(Rd)). In particular, u(t,x) = E~[f(w(t))] for f C C02(Rd) is the unique solution of the evolution equation
du(t)/dt = Au(t),
u(O) = f .
Ifc(t,x) is continuous and v(t,x) is sufficiently smooth in (t,x) on [0, ec) R d, then by It6's formula, we have for a solution X of (58) with X(0) = x, v(t,X(t))exp{fotC(s,X(s))dsl -v(O,x) explfSc(,,X(z))drl
i=1 k ~ l 0
LJo
~V (s'X(s) )ak(X(s) i "~ixi ) " dWk(s)
/o'exp [/0sc(%X(v))dr
. ~ + (A +c) (s,X(s))ds .
The first term (denote it by M(t)) in the right-hand side of this equation is a sum of stochastic integrals by Wiener processes so that it is a local martingale. Hence, if an {Yt}-stopping time T satisfies E({M)(T)) < oc, then we have E N ( T ) ] = 0
912
S. Watanabe
by Doob's optional stopping theorem. If, for example, v satisfies the heat equation 8v
8t + (A + c)v = O ,
we have from this that
v(O,x) = E{v(T,X(T))exp[fo~C(s,X(s))ds] } .
This formula can be used to obtain the probabilistic representations, in terms of the diffusion X, of solutions for initial or boundary value problems related to the operator A. On these topics, cf. e.g., Friedman, 1975.
3.5. SDE's and stochastic flows of diffeomorphms

Consider again the SDE (58):
dX(t) = a(X(t) )dW(t) + b(X(t) )dt .
(58)
We assume that coefficients o- and b are smooth; we assume here that they are (g~o with bounded derivatives of all orders > 1, for simplicity, although a more refined result may be stated which depends on the order of regularity of coefficients. Under this condition, the unique strong solution for (58) exists, that is, on the r-dimensional Wiener space (W~,~(W~),Pr/), there exists a family {X x = (X(t, x; W))}xeRd of Rd-valued continuous processes where X x is the unique solution of the SDE (58) with initial value x and with respect to the canonical Wiener process w E W~. Thus, we obtain, for almost all w E W~, the map
[0, o~) R ~ ~ (t,x) ~ x ( t , ~ ; w ) ~ R ~

If we consider this as a map
Rd ~ x ~ [t H x ( t , x ; .)] c w ~ ,
then it induces the law P~ on W a. As we saw in Section 3.4, the system {P~} is the diffusion process generated by the operator A. If we regard it as a map
[0, ~ ) ~ t ~
[~ ~ x ( t , x ; )1 ,
then we have the following important result due to J.-M. Bismut and H. Kunita (cf. Kunita, 1990): THEOREM 23. With probability one, for all t >_ 0, the map [x ~ X ( t , x ; . ) ] is a diffeomorphism of R d. Furthermore, introducing a natural topology on the group G formed of all diffeomorphisms of R d, the map
is continuous, a.s.
913
The continuous process with values in the group G, thus obtained from the solutions of SDE (58), is called the stochastic flow ofdiffeomorphisms associated to SDE (58). The above theorem may be regarded as one of the most refined results concerning the dependence of solutions of a SDE on its initial values. The dependence on initial values of solutions has been studied from the beginning of SDE theory mainly by Russian and Ukrainian schools; by Yu. N. Blagovescenskii, M.I. Freidlin, I.I. Gikhman, A.V. Skorokhod, among others. As an application, we can show that the function u(t,x), t > O, x E R d, defined by u(t,x) = EW[f(X(t,x, w)], f C C02(Rd), is (~2 inx and satisfies ~u/at = Au where A is the differential operator (71). 3.6. SDE's and the Malliavin calculus We consider the same SDE (58) under the same conditions on coefficients as in Section 3.5; the coefficients are cg~ with bounded derivatives of order _> 1. Also, we consider the solution X ( t , x ; w ) of (58) with initial value x realized on the Wiener space ( W ~ , 2 , P w) where Y is the completion of ~(W~) with respect to pW. Since t -+ f ( X ( t , x; w)) - f ( x ) - f o ( A f ) ( X ( s , x; w))ds is a martingale under pre for f c C~ (R d) (the space of all ~ - f u n c t i o n s on R a with compact supports), we have EPe[f(X(t,x;w))] = f ( x ) +
/0 '
EW{(Af)(X(s,x;w))]ds .
This relation implies that the transition probability P(t,x, dy) -p W ( x ( t , x ; w ) E dy) is, for each x, a distribution solution of ( 8 / t - A * ) p = 0 where A* is the adjoint operator of A. Hence if the operator ~ / ~ t - A* is hypoelliptic, we can conclude that P(t, x, dy) possesses a smooth density p(t, x, y) by appealing to the theory of partial differential equations. We can also discuss this problem probabilistically by applying the Malliavin calculus to the solution
x(t, x; w).
The Malliavin calculus is an infinite dimensional differential calculus for PW-measurable functions, i.e., Wiener functionals, on the Wiener space (W~), ~ , pW). Since the space W~ is a linear topological space by the topology of uniform convergence on finite intervals, we can consider the (Fr6chet) derivative for functions defined on W~) in the direction of an element in W~. However, the Rd-valued Wiener functional w H X ( t , x ; w ) for each fixed t _> 0 and x E R d is not differentiable in this sense, in general; it is not even continuous, in general, or, what is worse, it is not a function on W~ in a naive sense but is an equivalence class of functions on W~ coinciding each other PW-almost everywhere so that we can not even define its value at each specified point of W~. An important discovery of P. Malliavin is that, nevertheless, these Wiener functionals can be differentiated as many times as we want so that they may be called smooth if the notion of derivatives is modified suitably: Firstly, we restrict the directions of derivatives to be in the Cameron-Martin subspace H of the Wiener space W~.
914
S. Watanabe
Here, the Cameron-Martin subspace H is a Hilbertian subspace of W~ defined by H= with the norm
1
hEW~l~hEL2([0, oc)~Rr),
h(t)=
h(s)ds
IlhliH =
lihllL2ctO,~)-~Rr>--
Ih(s)] ds
Secondly, the derivative is defined in a similar way as the Sobolev weak derivative, equivalently, as the Schwartz distribution derivative in the usual analysis for functions on a Euclidean space. The class of real valued Wiener functionals, having the derivatives of all orders in this sense and, furthermore, these derivatives having the moments of all orders with respect to the Wiener measure, is denoted by Doe. We call Doe the space of test Wiener funetionals on the Wiener space. It should be remarked that a test Wiener functional is not necessarily continuous; indeed, there exists a test functional which can not be made continuous on W~ by any modification of its values on a PW-null set; in other words, the Sobolev imbedding theorem on a Euclidean space no longer holds on Wiener space. Malliavin's first important remark is that the Rd-valued Wiener functional X(t,x;w) for each fixed t _> 0 and x E R d is in the space (D~) d, i.e., each of its d-components is in the test functional space. We can define, in the same way as the Schwartz theory of distributions on a Euclidean space, the notion of generalized Wiener functionals or distributions on the Wiener space: We introduce a natural topology on the space Doe (by means of the family of Sobolev-type norms defined on it) and then consider the topological dual, i.e. the space of all continuous linear functionals on Doe. We denote it by D - ~ and call its element a generalized Wiener functional. A typical example of generalized Wiener functional is given by a formal expression 6y(F(w)) where 6y(.) is the Dirac delta-function on R d with the pole at y E R d and F is an R d -valued Wiener functional. In the frame of the Malliavin calculus, this formal expression can be rigorously defined as an element in D -~ if F satisfies the following conditions: (i) F E (D~) a, i.e. F = ( F 1 , . . . ,F d) with F i C D e for i = 1,... ,d, (ii) F is non-degenerate in the sense of Malliavin that [det ~F(w)] 1 has moments of all orders with respect to pW . Here CrF(W) = ((OFi(w), DfJ(w))) and Ofi(w) is an H-valued Wiener functional uniquely determined by DFi(w)[h] = (DFi(w),h)H, h c H, where DFi(w)[h] is the derivative of F i in the direction of h E H and (% ')~r is the H-inner product. Under the condition (i) above, (DFi(w),DFJ(w))H c Doe for all i,j and aF(W) is nonnegative definite for PW-almost all w. OfF(W) is called the Malliavin covariance of F.
Ira's stochastic calculus and its applications
915
Suppose that an Rd-valued Wiener functional F satisfies the above conditions (i) and (ii) so that 6y(F(w)) is well-defined as an element in D oo. Then the coupling (6y(F(w)), ~) (i.e., the value of the linear functional at ~)is well-defined for (b c D . This coupling is called a generalized expectation of 6y(F(w)). ~ and is denoted by
EWI6y(F(w) ) . ~)] .
The notion of generalized expectations indeed generalizes the notion of the Wiener functional expectations; when a Wiener functional ~ is pth power integrable for some p > 1, then the coupling of ~ E Lp c D - ~ against c D is exactly the expectation by the Wiener measure pW of the product ~g. ~, which is now an integrable Wiener functional. The Wiener functional taking the constant value 1, denoted by 1, is a test Wiener functional and the generalized expectation
PF(Y) := {6y(F(w)), 1) = EW[cSy(F(w)]

is well-defined. By the Malliavin calculus, we can show that PF is ~oa and this coincides with the density of the law of the Rd-valued Wiener functional F. A generalized expectation EW[6y(F(w)) ~b] for a test Wiener functional is also c ~ in y and this coincides with EW[~b(w)IF(w) = Yl PF(Y). In this way, the Malliavin calculus provides us with an efficient probabilistic method to study the existence and the smoothness for the density of laws of Wiener functionals and also the smoothness of conditional expectations. We know that the solution X(t,x; w) of SDE (58) satisfies the condition (i) above for all t > 0 and x c R d. If we can show that it also satisfies the condition (ii) for t > 0 and x c R d, then the generalized expectation
p(t,x,y) =EWlby(X(t,x;w))],
t > O, x E R d
is well-defined and defines a ~~-function of (t,x,y) E (0, ~ ) R d R d. p(t,x,y) thus obtained is the density of the transition probability P(t,x, dy) and, at the same time, is a fundamental solution to the parabolic partial differential equation
8u/Ot = Au.
A simple sufficient condition for the nondegeneracy condition (ii) of the Malliavin covariance of X(t,x,w) is the ellipticity of the operator A, i.e., det[a ij](x) > 0. A more delicate condition can be given in terms of Poisson brackets of vector fields
d
=
Z
i=1
k = 1 , . .. , r,
Vo(x)= ~
i=1
bi
1
- ~ k=l ~ axj
which corresponds to the H6rmander condition for the hypoellipticity in the theory of partial differential equations. Cf. Kusuoka-Stroock, 1985, on these topics.
916
S. Watanabe
F o r a general account on the Malliavin calculus, cf. e.g., Malliavin, 1997, I k e d a - W a t a n a b e , 1989 and W a t a n a b e , 1983.
3.7. Skorokhod equations and reflecting barrier diffusions

Let X = (X(t)) be a one-dimensional Brownian m o t i o n and define a continuous process X + = (X+(t)) by X+(t) -- IX(t)l. Then, X + is a diffusion process on the positive half line [0, oc) with the transition probability
+expE
It is called the reflecting Brownian motion on [0, cx~). This diffusion can also be obtained as a unique solution of the following equation introduced by A.V. Skorokhod: Consider the equation
X(t) = X(O) + W(t) + ~(t)

where
(72)
(i) X = (X(t)) is a continuous process on [0, oe), (ii) W = (W(t)) is a one-dimensional Brownian m o t i o n with W(0) = 0 which is independent of X(0), (iii) ~b = (q~(t)) is a continuous, increasing process such that, with probability one, ~b(0) = 0 and t ~ qS(t) increases only on such t that X(t) = 0, i.e.,
fo t l {o} (X(s) )d4(s) = ( t)
for all
>0 .
THEOREM 24. Given a [0, oe)-valued r a n d o m variable X(0) and a one-dimensional Brownian m o t i o n W = (W(t)) with W(0) = 0 which are mutually independent, there exists a unique pair (X, ~b) of a continuous process X = (X(t)) on [0, oc) and a continuous increasing process q5 = (~b(t)) which satisfies the p r o p e r t y (iii) above and the S k o r o k h o d equation (72). X = (X(t)) is a reflecting Brownian m o t i o n on [0, co) and ~b = (~b(t)) is given f r o m X by
1 t
q~(t) = l i r a - - f l[o,2l(X(s))ds, 4o 2e J0
t>0 .
~b = (~b(t)) is called the local time at x - 0 of the reflecting Brownian m o t i o n X. It is easy to find explicit forms of X and qS:
X(t) = X ( 0 ) + W(t) - min {(X(0) + W(s)) A 0},

0<s<t
q~(t) = - m i n { ( X ( 0 ) +
O<_s<_t
W(s))A 0} .
These formulas for the reflecting Brownian m o t i o n and its local time were first obtained by P. L6vy (L6vy, 1948).
917
The Skorokhod equation can be considered on more general domains, First, consider the case of finite interval I = [a, b], a < b. Then it is given by
X(t) = X(O) + W(t) + c~(t)
(73)
where X(0) is a random variable with values in I, W = (W(t)) is a one-dimensional Brownian motion with W(0) = 0 which is independent of X(0), X = (X(t)) is a continuous process on I, ~b = (qS(t)) is a continuous increasing process with qS(0) = 0 and J0 l{a,b}(X(s))d(o(s) = ~b(t) for all t _> 0, a.s., i.e., qS(t) increases only on such t that X(t) = a or X(t) = b. In the same way as Theorem 24, X and q5 exist uniquely for given X(0) and W. X is called a reflecting Brownian motion on I. Next, we consider the Skorokhod equation in the multidimensional case. Let D be a domain in R d with a @-boundary ~D and let/7) ~ x ~ n ( x ) ~ R d be given such that it is @ and, for x E ~D, n(x) is a non-tangential and inward-pointing unit vector. Consider the Skorokhod equation in D = D U OD:
X(t) = X(O) + W(t) +
/0 ' n(X(s))dO(s)
(74)
where X(0) is a / ) - v a l u e d random variable and W = (W(t)) is a d-dimensional Wiener process with W ( 0 ) = 0 such that they are mutually independent, and X = (X(t)) and 4 = (~b(t)) are such that X is a / ) - v a l u e d continuous process, ~b is a continuous increasing process with ~b(0)=0 and satisfies fo leD(X(s))dO(s) = ~b(t). Again, for given X(0) and W, X and ~bexist uniquely. X is called an obliquely reflecting Brownian motion in D in the direction n(x) of reflection at the boundary. If n(x) is the normal vector at each boundary point x E D, then X is called a reflecting Brownian motion. IfPx is the law of a solution of (74) with X(0) = x, then the system {Px} defines a diffusion process on /) generated by the differential operator A/2 (A: Laplacian) and the boundary condition ~u/~n = 0 in the sense that the transition expectation u(t,x)= Ex~f(w(t))] is characterized by the solution to the initial and boundary values problem: ~u
St-2Au
inD,
~u ~
~D
=0,
ult=0 = f
The Eq. (74) can be further extended to the following SDE: We write it in the equivalent form using stochastic differentials;
dX(t) = a(X(t))dW(t) + b(X(t))dt + n(X(t))d(o(t)
(75)
where a(x) : / ) --+ R d R r and b(x) : / ) --+ R a are continuous functions. The increasing process 4) has the same meaning as above. We can show the existence and the uniqueness of solutions for a given independent pair of X(0) and an r-dimensional Wiener process W if a and b satisfy the Lipschitz condition i n / ) . The solutions of (75) define a diffusion process o n / ) which is generated by the differential operator A given by (71) and the boundary condition
~u/~n = O.
918
S. Watanabe
The Skorohod equation in a multidimensional region was first studied by H. Tanaka (Tanaka, 1979, cf. e.g., Lions-Sznitman, 1984 and Saisho, 1987 for extensions and refinements). There are many works on SDE's for diffusion processes in domains with more general boundary conditions than reflections (cf. e.g., Ikeda-Watanabe, 1989, Takanobu-Watanabe, 1988).
3.8. Examples (1) Linear and exponential diffusions

Consider the following one-dimensional SDE:
dX(t) = [a(t)X(t) + b(t)]dW(t) + Ic(t)X(t) + d(t)]dt

where a(t), b(t), c(t), d(t) are deterministic call-functions of t E [0, to), (0 < to _< oc) and W(t) is a one-dimensional Wiener process. The equation is equivalent to that in the form of Stratonovich differentials:
dX(t) = a(t)X(t) o dW(t)

+ [c(t) - a2(t)]X(t)dt + b(t). dW(t) + [d(t) - la(t)b(t)] dt . Set
M(t) = exp Then
{ J0'
a(s)dW(s)
--
J0'
C(S)
--
1 2 (s)] ds ~a ,
t c [0, to) .
M(t)-' o dM(t) = -{a(t)dW(t) + [c(t) - ~al2(t)] dt}

and hence,
dX(t) = - X ( t ) M ( t ) -1 o dM(t) + b(t) . dW(t) + [d(t) - a(t)b(t)] dt ,

i.e.,
d(M(t)X(t)) = b(t)M(t) o dW(t) + [d(t) - a(t)b(t)]M(t) dt .

From this, the solution X(t), t c [0, to), with the initial value X(0) is uniquely obtained by
X(t) = M(t) -1 (X(O)q- j[otb(s)M(s)o dW(s) + ~o't[d(s) - a(s)b(s)]M(s) ds) .

If, in particular, a(t) = 0 and X(O) is Gaussian random variable independent of W = (W(t)), then the solution X = (X(t)) is a Gaussian process. The case of
919
Langevin's equation; a(t) =- d(t) =- O, b(t) - 1, c(t) = - 7 , (7 > 0), is a typical example and the solution
X ( t ) = e -'/t X ( O ) +
/0'
e 'lsdW(s
tE[O, oc)
is well-known as Ornstein-Uhlenbeck's Brownian motion. The case of a(t)=_0, b ( t ) = 1, c ( t ) = - 1 / ( t 0 - t ) and d ( t ) = y / ( t o - t ) , for 0 < to < oc and y E R, is the SDE for the pinned Brownian motion; the solution XJ,Y(t), 0 < t < to with the initial value x is a Brownian motion B(t) conditioned by B(0) = x and B(to) = y. Note also that the Eq. (48) for the Black-Scholes model of Section 2.8 is the case of a(t) = a, b(t) - O, c(t) - tt and d(t) =_ O.
(2) The square of Bessel processes

Let a, c, d be real constants such that a > 0 and d _> 0, and consider the following one-dimensional SDE:
dX(t) = (2aX(t) V o)l/2dW(t) + (cX(t) + d) dt .

By Theorem 18, the unique solution X(t) exists for a given initial value, and, if X(0) > 0 a.s., then X(t) >_ 0 for all t _> 0, a.s.. The solutions define a diffusion process on [0, ec) generated by the differential operator: d2 d (cx + d)
A = axe+
The case of c = d = 0 has been considered by W. Feller (Feller, 1951) in connection with a scaling limit of critical Galton-Watson branching processes. Let a = 2 and c = 0. If X = (X(t)) is the corresponding diffusion on [0, oc), then 2 -- (X(t)) defined by X(t) = X ~ is a diffusion process on [0, oc) corresponding to the differential operator
--x
"
This diffusion on [0, oc) is known as the Bessel diffusion of dimension d. It is wellknown that, when d is a positive integer, the radial motion of a Brownian motion on R d is a Bessel diffusion of dimension d.
(3) SDE's related to genetic models

Consider the following one-dimensional SDE:
dX(t) = azv/{X(t)(a - X(t))} V 0. dW(t) + c(X(t)) dt

where c(x) is Lipschitz-continuous and satisfies that c(0) _> 0 and c(1) < 0. By Theorem 18, the unique solution exists for a given initial value, and, if X(0) E [0, 1] a.s., then X(t) E [0, 1] for all t > 0, a.s. Thus, the solution defines a diffusion process on [0, 1]. This class of diffusions appears as scaling limits of M a r k o v chain models for gene frequencies in population genetics; cf. e.g., Sato, 1976, for examples of such SDE's and their multi-dimensional extensions.
920
S. Watanabe
4. Stochastic differential equations on manifolds
So far, we considered SDE's on a Euclidean space. Just as in the case of dynamical systems, however, natural state spaces on which SDE's are to be considered are manifolds: SDE's on manifolds provide us with basic tools in the probabilistic study of problems in analysis and geometry on manifolds. Throughout this section, we only consider strong solutions of SDE's so that, we always take, as our basic probability space, the r-dimensional Wiener space (W~), ~ , pW) where ~- is the completion of ~(W~) with respect to pW. We always consider SDE's based on the canonical Wiener process w = ( w ( t ) ) c W~; w(t) = ( w l ( t ) , . . . , w r ( t ) ) . When we speak of a filtration { ~ t } on the Wiener space, it is always the natural filtration of the canonical Wiener process.
4.1. SDE's on manifolds: The basic theory

Let M be a connected a-compact cg%manifold of dimension d and let WM = C([0, ~ ) ~ M) be the space of all continuous paths on M. When M is not compact, let _~/= M tA {A} be the one-point compactification of M and WM be the space of all continuous paths on M with A as a trap, i.e., w(t) = A for all t _> s if w(s) = A. WM and W~2 are endowed with the topology of the uniform convergence on finite intervals and N(WM) and N(W~?) be the topological a-field on WM and W~2, respectively. Suppose that we are given a system of cg~-vector fields A0,A1,..., Ar on M. We write a SDE on M in the following form:
dX(t) = Ak(X(t) ) o dwk(t) + Ao(X(t) ) dt .
(76)
Here, we adopt the usual convention for the omission of the summation sign. A precise meaning of the SDE (76) is as follows. We say that X = (X(t)) is a solution to SDE (76) i f X is an {Wt}-adapted continuous process on F / w i t h A as a trap, such that, for any f C C~ (M) (:= the space of all cg%functions on M with compact support; we s e t / ( A ) = 0 when we consider on ~/), f ( X ) = ( f ( X ( t ) ) ) c S (i.e., a continuous semimartingale) and satisfies
d f ( X ( t ) ) = (Agf)(X(t)) o dwk(t) + (aof)(X(t)) dt ,

where o is the Stratonovich differentials defined in Section 2.2. If
(77)
Ak(x) = ak(x ~ ,
k=l,
... ,r,
and
Ao(x) = bi(x) ~ i
in a local coordinate, then X - ( X ( t ) ) is a solution of (76) if and only if X(t) = (X 1(t),... ,Xd(t)), in each local coordinate, is a d-dimensional continuous semimartingale and satisfies
Itd's stochastic calculus and its applications dXi(t)

:
921
awe(t) + bi(X(t))dt
bi 2 k--1 "4 (X(t))dt
(78)
: aik(x(t))" dwk(t)
Note that the Eq. (78) has an intrinsic (i.e., coordinate-free) meaning because the Stratonovich differentials obey the usual chain rule. THEOREM 25. For each x E M, there exists a unique strong solution X ~ = (XX(t)) of (76) with X~(0) = x. Setting XX(t; w) = (X(t,x; w)), X(t,x; w) is an M-valued Wiener functional for each (t, x), and the system {Px}x~M, where Px is the law on W~ of It ~-+X(t,x; .)], defines a diffusion process on M which is generated by the second order differential operator A of the H6rmander type:
1 r
A=~k~__IAZ+A0 . Furthermore, except on a set ofPW-measure 0, the following holds: For each fixed (t, x0, w) such X(t, x0; w) c M, the map x ~ X(t, x; w) is a diffeomorphism between a neighborhood of x0 and a neighborhood o f X ( t , xo; w). The unique existence of solutions can be deduced by solving the Eq. (78) in each local coordinate and then putting these solutions together. We can also imbed the manifold M in a higher dimensional Euclidean space by appealing to Whitney's imbedding theorem (Whitney, 1957) and apply the result in the Euclidean case. The form of the differential operator A can be deduced from It6's formula. Also, the diffeomorphic property of solutions as stated in the theorem can be deduced from Theorem 23 of Section 3.5.
4.2. Brownian motions on Riemannian manifolds

Let M be a Riemannian manifold. A diffusion process on M (more precisely, a diffusion process on M U {A} with A as a trap) is called a (minimal) Brownian motion on M if it is generated by AM/2 where AM is the Laplacian, i.e. LaplaceBeltrami operator, on M. A Brownian motion on M exists uniquely: It is a family {Px} of probability measures on W~2 such that Px(w(O) = x) = 1 and, for every f C C~ (M), t 1
is a martingale under Px. There are two general methods to construct the Brownian motion on M using SDE's. The first method is to obtain it as the trace (i.e. projection) on M of the stochastic moving frame over M which will be discussed in the next Section 4.3. The second method is based on the imbedding of the manifold: We imbed M into a higher dimensional Euclidean space R r so that the Riemannian metric on M is
922
S. Watanabe
induced f r o m the Euclidean metric o f R r. If TxM is the tangent space at x E M of M, then we m a y consider that TxM C R r as a linear subspace and the Riemannian n o r m o f TxM is the restriction o f the Euclidean n o r m o f R r. Let P(x): R r ~ ~ ~ P(x)[~] E rxM
be the o r t h o g o n a l projection of R r onto TxM. Define & ( x ) ~ TxM, x E M, k = 1 , . . . , r, by Ak(x) = P(x)[6k], where [61,..., 6~1 is the canonical base o f R~:
k-th
c~k=(0,...,0,
1 ,0,...,0),
k=l,...,r
Then A 1 , . . . ,At are s m o o t h vector fields on M and we can consider S D E
dX(t) = A~(X(t)) o dw~(t)
(79)
on the r-dimensional Wiener space. If we denote by Px the law on W~2 o f the solution X = (X(t)) of (79) with initial value x c M, then the family {Px} defines the Brownian m o t i o n on M. N o t e that, using the imbedding M c R', S D E (79) can be solved as an S D E on Rr: we extend the vector fields A 1 , . . . ,A~ to s m o o t h vector fields A 1 , . . . ,A, on R ~ and consider the S D E
dX(t) = ~lk(X(t) ) e dwk(t)

on R r. If the initial value x is in M, the solution stays in M for all t > 0 and is irrelevant to the way o f extensions of vector fields. As an example, consider the case o f the unit sphere S d imbedded in R d+l as
Sd z x z (Xl,. . ,Xd+l) E R d+l xk 2 ~
Then
d+l i=1 ~
Ak(x) = Z(g)i -- XkXi) ~Xi
k = 1,..., d + 1 .
Hence, if we consider the following S D E on R d+l defined on the ( d + 1)dimensional Wiener space:
d+l
dXi(t) = dwi(t) - xi(t) Z Xk(t) o dwk(t)

k=l d+l
: dwi(t) - xi(t) Z X k ( t )
k=l
dwk(t)
-~.xi(t)dt,
i = 1 , . . . , d + 1,
then the solution X = (X(t)) stays on the sphere S d for all t > 0 if X(0) = x c S d and this solution is a Brownian m o t i o n on S d. This construction o f the spherical Brownian m o t i o n is due to D.W. Stroock.
923
As another example, we consider a Brownian motion on the Lobachevskii plane; the Lobachevskii plane or hyperbolic plane is a two-dimensional, complete, simply connected Riemannian manifold of constant negative curvature which we realize, as usual, by the upper half plane H2 = {(x,y)[y > 0} with the Riemannian metric ds 2 (dx 2 Av d y 2 ) / y 2. Then the Laplacian is given by AI-I2 = yZ(~2/Sx2 + ~2/~y2). Hence, SDE for the Brownian motion is given, on twodimensional Wiener space, by
=
dX(t) = Y(t)dwl(t),
dY(t) = Y(t)dw2(t) .
By the result in Section 2.5, we see that the unique solution (X(t), Y(t)) with the given initial value (X(0), Y(0)) = (x,y) c H2 is given by
X(t) = x +
g(s) dwl(s) = x y "
exp w 2 ( s ) - ~
dwl(s),
Y(t) = yexp[w2(t) - 2 ]
This (X(t), Y(t)) defines the Brownian motion on H2 starting at (x,y). For the Brownian motion on H2 and its applications, c.f.e.g., papers by J.-C. Gruet, and by L. Alili and J.-C. Gruet, in Yor (ed.), 1997 and Ikeda-Matsumoto, 1999,
4.3. Stochastic moving frames

Let M be a Riemannian manifold of dimension d and x c M. By a frame at x, we mean an orthonormal base (ONB) e = [ e l , . . . , ed] of the tangent space TxM at x. Thus a frame r at x ~ M can be described as r = (x, e), x c M, e = [ e l , . . . , ed]. The totality of frames at all points in M is denoted O(M). The projection re: O(M) -+ M is defined by ~(r) = x if r = (x, e). O(M) is called the bundle of orthonormalframes over M, We can identify r C O(M) with an isometric isomorphism 7 : R d ~ TxM, x = z(r), defined by sending each of the canonical base 6i, 6i =
/-th
( 0 , . . . , 0, 1 , 0 , . . . , 0), in R # to el, i = 1 , . . , , d, where r -- (x, e), e = [ e l , . . . , ed]. is called the canonical isomorphism associated to the frame r. There is a natural right action on O(M) of the orthogonal group O(d):
Rg : O(M) O(d)
(r, g)
rg
O(M)
defined by r9 = r o g, where O(d) is identified with the group of all isometric isomorphisms of R d. N o w O(M) can be given a natural structure of manifold so that it is a principal fibre bundle over M with the structure group O(d). There is an important r a n d o m motion of frames which we call a stochastic moving frame over M. This notion can be defined precisely by means of SDE on the bundle O(M) of orthonormal frames. Before giving a formal definition, however, we explain the idea in a simple case of M being a two dimensional sphere S 2. We take a plane R 2 and let w = w(t) be a Brownian curve in R 2
924
S. Watanabe
starting at the origin 0; that is, w W := W~ endowed with the standard twodimensional Wiener measure pW on W so that (W,P W) is the two-dimensional Wiener space. We assign at each point w(t)E R 2 the canonical ONB 3l = (1,0) and 32 = (0,1) so that 3 = [31,32] forms an ONB in the tangent space Tw(t)R2 ~- R 2 at w(t). Then these bases at different points along the curve w(t) are parallel to each other. Given a sphere S 2, choose a point x c S 2 and an ONB e = [el, e2] in the tangent space TxS2 at x. We put the sphere S 2 on the plane R 2 so that x touches at the origin 0 and the ONB e coincides with the ONB 3. N o w we role the sphere S2 on the plane R 2 along the Brownian curve w(t) without slipping. Suppose that the Brownian curve w(t) is traced in ink. Then the trace of w(t) together with the ONB 6 at w(t) is transferred into a curve x(t) on S 2 with an ONB e(t)--[el(t),e2(t)t in Tx(t)S 2. The random curve r(t) = (x(t), e(t)) thus transferred is what we call stochastic moving frame over S 2. Also, the random curve x(t) thus obtained is a Brownian motion on the sphere S 2. A formal definition of the stochastic moving frame is as follows. In differential geometry, there is a notion of the system of canonical horizontal vector fields A1,... ,Ad on the orthonormal frame bundle O(M): For each i = 1,..., d, Ai(r) is a smooth vector field on O(M) uniquely determined by the property that the integral curve (i.e. the solution) r(t) on O(M) of the following ordinary differential equation on O(M) with the initial value r E O(M): dr(t)
dt -Ai(r(t)),
r(O)=r=(x,e),
e={el,...,edl
coincides with the curve
r(t) = (x(t),e(t)),
e(t) = [el(t),...,ed(t)] ,
where x(t) is the geodesic on M with x(0) = x and ax It=0 = ei and e(t) is the parallel translate, in the sense of Levi-Civita, of e = [ e l , . . . , ed] along the curve
4t).
Let (W0 d, 2 , P v/) be the d-dimensional Wiener space. The stochastic moving frame over M starting at a frame r C O(M) is, by definition, the solution
r(t) = (r(t, r; w)),

of the SDE
r(t, r; w) = (x(t, r; w), e(t, r; w))
dr(t) = A/~(r(t)) o dwk(t)
(80)
on O(M) with the initial value r. As for the action of the orthogonal group O(d) on O(M), we have for each g E O(d) (being identified with an isometric isomorphism on R d) that
r(t,r;w)g=r(t, rg;g-lw),
t >O, r e O(M) ,
where g lw E W d is defined by (g-lw)(t) = g l[w(t)], t > O. In particular, if x(t, r; w) = ~[r(t, r; w)],
925
x(t,r;w)
=4t,~g;g-lw),
t ~ o, ~ e O(M), g c O(d) .
(81)
Since g lw ---d w by the rotation invariance of Wiener process, we deduce that
{~(t, ~g; w), t > o} {x(t, ~; w),
t _> 0}, ~ c
O(M),
g ~ O(d) .
(82)
In other words, the law P~ on W~? of [t H x(t, r; w)l satisfies Prg = P, for all g c O(d). This implies that P, depends only on x -- To(r) so that we may write Pr ----Px. Then, the system {Px} defines a diffusion on M with A as a trap and we can show that it is generated by AM/2, that is, it is a Brownian motion on M. In this way, the Brownian motion on M can be obtained by projecting the stochastic moving frame over M. Another important application of the stochastic moving frame will be given in the next section. The notion of stochastic moving frames has been first introduced by J. Eells and D.K. Elworthy (Eells-Elworthy, 1976) to realize an idea of K. It6 on the parallel displacement of tensor fields along the Brownian curve (the paper [23] in It6, 1987). Cf. Ikeda-Watanabe, 1989 for details.
4.4. Probabilistic representations of heat kernels by Wiener functional expectations

In m a n y problems in analysis and geometry of Riemannian manifolds, an important role is played by the heat kernels, i.e., the fundamental solutions to heat equations; for example, there are so-called heat equation methods used effectively in such problems as asymptotics of the eigenvalues of the Laplacian, AtiyahSinger index theorem, Morse inequalities for Morse functions and so on. Here, we discuss a probabilistic approach based essentially on SDE theory and the Malliavin calculus to obtain heat kernels by Wiener functional expectations. F r o m now on, we consider on a compact Riemannian manifold M, for simplicity. Let r(t) = (r(t, r; w)), t > O, r E o(m), w E W0 d be the stochastic moving frame over M as discussed in the previous section. Note that, since M is compact, it does not explode in finite time, a.s.. F o r each (t, r) C [0, 1] o(m), r(t, r; w) is a smooth O(M)-valued Wiener functional in the sense that f(r(t, r; w)) E D ~ for every smooth function f on O(M). Hence the M-valued Wiener functional X(t, r; w) is smooth in the same sense. Also for fixed t > 0 and r E O(M), it is nondegenerate in the sense of Malliavin (this notion can be defined similarly as in Section 3.6.) Then, as we saw in Section 3.6, we can define the formal expression by(X(t, r; w)) as an element in D -~ where by is the Dirac delta function with the pole at y ~ M. By (82), we have
x ( . , r; w) d x ( . , rg; w),
g ~ O(d) ,
and hence the generalized expectation EV/by(X(t, r; w)) depends only on x = ~(r) so that we may write
926
S. Watanabe p(t, x, y) = EW6y(X(t, r; w) ) .

Su 1 (83)
This p(t, x, y) is the heat kernel, i.e., the fundamental solution to the heat equation
St - 2 Avu . eAk(r(t)) o dwk(t),

r(0) = r
(84)
If, instead of (80), we consider SDE with a parameter e > 0: dr(t) = (85)
and if we denote the solution by of Wiener process, that
r~(t, r; w),
then we have, by the scaling property
{r( 2t, r; w), t _> 0}

and hence
r; w), t _> 0}
p(eZ,x,y) = EW~Sy(X~(1,r; w) ) .
(86)
From this expression of the heat kernel p(t,x,y), we can deduce its regularity in (t,x,y) and study its short-time asymptotic properties (cf. e.g., Ikeda-Watanabe, 1988, Watanabe, 1990). Furthermore, if V is a smooth function on M, then the Wiener functional exp{-e 2 f~ V(X~(s, r; w))ds} of the Feynman-Kac type is in the space D , i.e, a test functional, so that the generalized expectation
EW Iexp{-e2 fot V(X~(s,r; w))ds}6y(X~(t,r; w)) ]

is well defined. By the same reason as above, it depends only on x = ~(r) and
p(~2,x,y) =EW[exp{-~2 fol V(x~(s,r;w))ds}~y(X~(1,r;w)) 1

defines the heat kernel for the heat equation with a potential: Su
(87)
St - 2 AMU -- Vu .
(88)
We now consider a heat equation on a vector bundle; for simplicity, we consider the case of the exterior product of cotangent bundle over M so that its sections are differential forms on M. The canonical isomorphism "~ : R d ~ TxM naturally induces an isomorphism "~:Ra--+ Tx*M and an isomorphism 7: i ~ R d ~ A T~*M by sending bases 6 i = (~) and 6il A-../~6i~ to fi and fi~A...Afip; for i = 1 , . . . , d and 1_<il < . - . < i p < _ d , respectively, where f=~.l...,fd] is the ONB in TxM dual to e = [ e l , . . . , e d ] . Here, /~ R d = ~pd=0 A p R d is the exterior product of R d which is a 2d-dimensional Euclidean space with the canonical basis 6i~/~ ""/~ 6/p. Let End(A R d) be the algebra of linear transformations on A Rd and let ai* E End(A R d) be defined by a~@) = (~i A )~, ,)v ~ A Rd' i = 1 , . . . , d .

Let ai be the dual of a*. Then the system
927
ai~ai2 . . .aipajla;2 ...ajq ,
where 1 _< il < ... < ip <_ d, 1 <_j~ < ... < jq < d, p , q = O, 1 , . . . , d , forms a basis in End(ARd), cf. Cycon et al., 1987. Let JiJk1(r)= (o}([Ak,Az])r be the scalarization of the Riemann curvature tensor. (co = (co}) is the connection form for the Levi-Civita connection, cf. Kobayashi-Nomizu, 1963, which is a so(d)-valued 1-form on O(M) and {Ak} is the system of canonical horizontal vector fields introduced above.) Let D2(J)(r) E E n d ( A R e) be defined by D2(J)(r) = jijkl(r)a*aja~al. Let r~(t, r; w) be the stochastic moving frame as defined above by the solution to SDE (85) and define an End(fRd)-valued process M~(t, r; w) by the solution to the following differential equation on End(A Re): dM(t) e2 dt - 2 D2(J)(re(t'r;w))M(t)'
M(O) = I .
(89)
Let 5y(X~(1, r; w)) be the generalized Wiener functional defined above and consider the following generalized expectation: 7E W[M~(1, r; w ) ; ( 1 , r; w) -~ 5y(X~(1, r; w))] . Here, ~(1,r;w) : A Re --+ A Tx(',r;w)M and 7: A Rd --+ A T*M
are canonical isomorphisms defined above. By a theory on a refinement of generalized expectations in the Malliavin calculus (called quasi-sure analysis in the Malliavin calculus) due to H. Sugita and H. Airault-P. Malliavin (Sugita, 1988, Airault-Malliavin, 1988), we may assume that Xe(1, r; w) = y in this expectation. In particular, we can assume that
#(,,r;w)
Hom(AVM,ARd)
and hence this generalized expectation is well defined and takes values in
-om(Ar;i, Ar;v)
Here, Hom(/d, V2) stands for the space of linear transformations from F] to V2, in general. By considering the action of O(d), we deduce as above that it depends only on x = ~(r). Thus we may set
p@2,x,y) = 7;Eve [Me(l, r; w)~(1, r; w)-' 6y(X~(1, r; w))]
(90)
and this kernel defines the heat kernel for the heat equation on the section F ( A T ' M ) of the exterior product bundle f T*M , i.e., on the space A(M) of differential forms on M:
928
S. Watanabe
Ou 1 a t = ~ Au
(91)
where A = - ( d * d + dd*) is the de Rham-Kodaira Laplacian acting on A(M). Cf. Ikeda-Watanabe, 1989 and Watanabe, 1990 for details and applications. The original probabilistic approach is due to J.-M. Bismut (Bismut, 1984).
4.5. Brownian motions on Lie groups
Let G be a Lie group. A stochastic process {g(t)}0<t<~ is called a right-(left) invariant Brownian motion on G if it satisfies the following: (i) with probability one, g(0) = e (the identity) and t ~ g(t) is continuous, (ii) for every t > s, 9(t). g(s) -1 (resp. g(s) -1 g(t)) and {g(u); u _ s} are independent, (iii) for every t>_ s, g(t). g(s) -1 (resp. g(s) -1.g(t)) and g ( t - s ) are equally distributed. Let A0,Ax,...,Am be the system of right-(left-)invariant vector fields on G and consider the SDE on the r-dimensional Wiener space:
r
dg(t) = ~--~Ak(g(t)) o dw~(t)+Ao(g(t))dt .

k=l
(92)
Then the solution with 9(0) = e exists uniquely and globally. We denote it by g(t; w). Then it is a right-(resp, left-)invariant Brownian motion on G and conversely, every right-(resp, left-)invariant Brownian motion can be obtained in this way. The stochastic flow of diffeomorphisms t ~ [g ~ g(t, g; w)] defined by the solution of SDE (92) is given as t ~-+ [g~-+g(t;w).g] (resp. t~[g~g.g(t;w)]) .
EXAMPLE 1. (Linear Lie groups) Let G be a linear Lie group, i.e. a Lie subgroup of the general linear group GL(d, R) or GL(d, C); the multiplication group of all d x d nonsingular matrices. The Lie algebra ~ of G is a Lie subalgebra of gl(d, R) or gl(d, C); the algebra of all d x d matrices. By identifying ~ with the space of all left-invariant vector fields on G in the usual way, a system A0,A1,...,Am of leftinvariant vector fields is given by do,a1,... ,at C ~. Then the SDE (92) can be written simply, in the matrix notation, as dg(t) = ~
k=l
g(t) " ak o dwk(t) + g ( t ) . aodt = g(t) o d~(t) ,
where w(t) is a N-valued continuous semimartingale given by #(t) = ~ = 1 wk(t)ak + tao. The solution g(t) with g(0) = Ed (d x d identity matrix) is a left-invariant Brownian motion on G.
929
If we consider, instead, the S D E in the martix notation: dg(t ) = o d#(t).g(t), then the solution g(t) with g ( 0 ) = E d is a right-invariant Brownian m o t i o n on G. EXAMPLE 2. (Nilpotent Lie groups and L6vy's stochastic area integrals) Let r _> 2 be an integer and d = r(r + 1)/2. Set H~(~R d~R r xso(d)) :--{x= (xk,x(i,j)) ,1 <_k < r, 1 <_ i < j << r} .
Hr is a Lie group under the group multiplication z = x- y, x, y E H~, defined by

Zk ~- Xk -1- Yk, Z(i~/) ~ X(ij) -]- Y(i,j) -- I(xiYj -- xjYi)
where x = (xk, x(~,j)), y = (Yx, y(~,j)), z = (Zk, Z(gd)). It is called the f r e e nilpotent Lie group o f step 2 with r generators. H2 is called the Heisenberg group. We define a system A o , A 1 , . . . ,Ar of vector fields on Hr by A0 = 0 and Ai(X)=~_~+ "-:=-'.z ) ,
j;j<l j;j>i ~X(id)
i= 1,...,r fields on
. Hr and, writing
Then, these are all right-invariant vector g(t) = (Xk(t),x(id)(t)), the S D E (92)is given by dXk(t) = dwk(t), 1 < k < r,
dXIiJl(t)
----xi(t) o dw](t) 2
Xi(t)-o dwi(t),
2
1< i< j < r . -
The solution X ( t ) with Y(0) = 0 is given by X k ( t ) = wk(t), k = 1 , . . . , r, and
1 ~0 t [wZ (s) o dw/(s) - w / (s) o d w i(s)l X (i'j) (t) = S (i'j) (t) := ~

=~ [wi(s) - dw/(s) - wJ(s) dwi(s)], 1 _< i < j < r .
s0,z) denotes, intuitively, an algebraic area of two dimensional region surrounded by the Brownian curve (w 1 (s), w2(s))o<s<t and the chord connecting the initial and terminal points of the Brownian curve. This notion has been introduced by P. L6vy (L6vy, 1948) and so it is called L~vy's stochastic area. The following formula, essentially due to L6vy (L6vy, 1951), is very i m p o r t a n t in m a n y applications: cf. e.g., Bismut, 1988, W a t a n a b e , 1997: F o r a = (aij) c so(r) (:= the totality of r x r skew symmetric real matrices),
w,1, 01
In particular, for r = 2, i.e., for w(t) = (wl(t), w2(t)),
930
S. Watanabe
EP[exp(aS(l'2)(1)lw(1) = 0)]
or, equivalently,
sin ~/2'
I~l < 2re ,
E P[exp(ic~S(l'a)(1)lw(1) = 0)] - sinh a / 2 '
~ E R
Additional remarks on references

(1) For more complete treatments of materials discussed above, the following books are recommended: Ikeda-Watanabe, 1981, 1989; Rogers-Williams, 1987; Karatzas-Shreve, 1988 and Revuz-Yor, 1991, t999. Important books on SDE theory published before 1980 are, among others, McKean, 1969; GihmanSkorohod, 1979 (Russian original, 1975) and Friedman, 1975, 1976. As a short introduction to stochastic calculus, we would add an expository article by M. Yor in the Bourbaki Seminar (Yor, 1982). (2) We could not give a full account on the so-called "general theory of processes" and theory of semimartingales and stochastic integrations based on it, as have been developed in the most complete and elegant form by P.A. Meyer, C. Dellacherie, J. Jacod, K. Bichteler, and many others. For these, we would refer the reader to Dellacherie-Meyer, 1975, 1980; Jacod-Shiryaev, 1987; Protter, 1990 and also, an expository article "A short presentation of stochastic calculus" by P.A. Meyer in the appendix of Emery, 1989. (3) For applications of conformal martingales to problems in complex analysis, cf. e.g., McKean, 1969; Durrett, 1984; Bass, 1995. (4) For approximation schemes of solutions of SDE's and numerical methods, cf. Kloeden-Platen, 1965. For approximation of solutions of SDE's by solutions of ODE's, cf. e.g., Wong-Zakai, 1965; Stroock-Varadhan, 1972; Kunita, 1990. In this connection, we should mention of recent works by T. Lyons on the differential equations driven by rough signals (Lyons, 1998) An important implication of his theory is that, a strong solution ofa Markovian SDE is a continuous functional of a driving Wiener process and its stochastic area integrals put together, that is, the discontinuities on the Wiener space of solution is essentially caused by the discontinuities of stochastic areas. (5) The formula for Wiener chaos expansion (ItS's multiple Wiener integral expansion) for solutions of Markovian SDE's has been obtained by Veretennikov-Krylov, 1976. (6) For applications to nonlinear filtering theory, cf., e.g., Kallianpur, 1996. For applications to stochastic control problems, cf. Krylov, 1980. (7) For examples of SDE's involving Poisson point processes, cf. e.g., Tanaka, 1972, for models connected with Boltzmann's equation of Maxwellian gas, Takanobu-Watanabe, 1988 for the process on the boundary of a diffusion process in a domain with Wentzell's boundary conditions. (8) For additional references to the Malliavin calculus, we would list two expository articles, Kusuoka, 1990; Watanabe, 1992. In statistical problems,
931
N. Y o s h i d a , 1992, 1993 gave a p p l i c a t i o n s o f the M a l l i a v i n calculus to a s y m p t o t i c p r o b l e m s o f statistics in the m o d e l o f small diffusions. (9) F o r stochastic calculus o n m a n i f o l d s a n d its a p p l i c a t i o n s , the following b o o k s m a y p r o v i d e useful i n f o r m a t i o n s : M c K e a n , 1969; M a l l i a v i n , 1978; E l w o r t h y , 1982; I k e d a - W a t a n a b e , 1989; E m e r y , 1989. (10) To discuss the stochastic flows o f d i f f e o m o r p h i s m s in a full generality, we need to c o n s i d e r m o r e general S D E ' s t h a n those t r e a t e d in this article: C o n s i d e r the S D E (76) on a m a n i f o l d M. I f we set A(t)(x) = ~ 1 A k ( x ) w k ( t ) -]- A o ( x ) t , then t ~ A(t) is a stochastic process t a k i n g values in the space V ( M ) f o r m e d o f all vector fields on M. It m a y well be called a V ( M ) - v a l u e d W i e n e r process because it is c o n t i n u o u s u n d e r the n a t u r a l t o p o l o g y on V(M) a n d o f s t a t i o n a r y i n d e p e n d e n t increments. T h e r e exist, however, m o r e general V ( M ) - v a l u e d W i e n e r processes t h a n those given in the f o r m o f A(t). The n o t i o n o f S D E ' s can be generalized to a n y V ( M ) - v a l u e d W i e n e r process a n d we can o b t a i n m o r e general stochastic flows o f d i f f e o m o r p h i s m s f r o m solutions. Cf. K u n i t a , 1990 on these topics.
References
Airault, H. and P. Malliavin (1988). IntEgration g+om&rique sur l'espace de Wiener. Bull. Sc. math. (2), 112, 3-52. Barlow, M. T. (1982). One-dimensional stochastic differential equation with no strong solution. J. London Math. Soc. 26, 335-347. Bass, R. F. (1995). Probabilistic Techniques in Analysis. Springer, New York. Baxter, M. and A. Rennie (1996). Financial Calculus. Cambridge University Press, Cambridge. Bismut, J.-M. (1984). The Atiyah-Singer theorem: a probabilistic approach. I. J. Funct. Anal. 57, 56-99. Bismut, J.-M. (1988). Formules de localisation et formules de Paul L6vy. In Colloque Paul L~vy sur les Processus Stochastiques, Ast6risque 157-158, Soc. Math. France, 37 58. Black, F. and M. Scholes (1973). The pricing of options and corporate liabilities. J. Political Economy 81,637-654. Cycon, H. J., R. G. Froece, W. Kirsh and B. Simon (1987). Schrddinger Operators with Application to Quantum Mechanics and Global Geometry. Springer, Berlin. Davis, B. (1975). Picard's theorem and Brownian motion. Trans. Amer. Math. Soc. 213, 353-362. Davis, M. H. and P. Varaiya (1974). The multiplicity of an increasing family of a-fields, Ann. Prob. 2, 958463. Dellacherie, C. (1974). Int6grales stochastiques par rapport aux processus de Wiener et de Poisson. In S~minaire de Prob. VIII, LNM 381, pp. 25-26. Springer, Berlin. Dellacherie, C. and P. A. Meyer (1975). Probabilitds et potentiel, Chap. NIV. Hermann, Paris. Dellacherie, C. and P. A. Meyer (1980). Probabilitds et potentiel, Chap. V-VIII. Hermann, Paris. Durrett, R. (1984). Brownian Motion and Martingales in Analysis. Wadsworth, Belmont. Eells, J. and D. K. Elworthy (1976). Stochastic dynamical systems. Control theory and topics in functional analysis III, pp. 179-185. Intern. atomic energy agency, Vienna. Elworthy, K. D. (t982). Stochastic Differential Equations on Manifolds. Cambridge University Press, Cambridge. Emery, M. (1989). Stochastic Calculus in Manifolds. Springer, Berlin. Feller, W. (1951). Diffusion processes in genetics. In 2rid Berkeley Syrup. Math. Stat. Prob., pp. 227346. Univ. Calif. Press, Berkeley. Friedman, A. (1975). Stochastic Differential Equations and Applications. Volume 1. Academic Press, New York.
932
S. Watanabe
Friedman, A. (1976). Stochastic Differential Equations and Applications. Volume 2. Academic Press, New York. Gihman, I. I. and A. V. Skorohod (1979). The Theory of Stochastic Processes IlL Springer, Berlin. Ikeda, N. and H. Matsumoto (1999). Brownian motion on the hyperbolic plane and Selberg trace formula. J. Funct. Anal. 163, 63-110. Ikeda, N. and S. Watanabe (1981). Stochastic DiffErential Equations and Diffusion Processes. NorthHolland/Kodansha, Amsterdam/Tokyo. Ikeda, N. and S. Watanabe (1989). Stochastic Differential Equations and Diffusion Processes. Second Edition, North-Holland/Kodansha, Amsterdam/Tokyo. It6, K. (1987). In Selected Papers (Eds., D. W. Stroock and V. S. R. Varadhan), Springer, New York. Jacod, J. and A. N. Shiryaev (1987). Limit Theorems for Stochastic Processes. Springer, Berlin. Kallenberg, O. (1996). On the existence of universal functional solutions to classical SDE's. Ann. Prob. 24, 196~205. Kallianpur, G. (1997). Some recent developments in non-linear filtering theory. In It6's Stochastic Calculus and Probability Theory, pp. 157-170. Springer, Tokyo. Karatzas, I. S. and E. Shreve (1988). Brownian motions and Stochastic Calculus. Springer, New York. Kobayashi, S. and K. Nomizu (1963). Foundation of Differential Geometry. Wiley, New York. Kloeden, P. E. and E. Platen (1997). Numerical Solution of Stochastic Differential Equations. Springer, Berlin. Krylov, N. V. (1980). Controlled DifJusion Processes. Springer, New York. Knnita, H. (1990). Stochastic Flows and Stochastic Differential Equations. Cambridge University Press, Cambridge. Kunita, H. and S. Watanabe (1967). On square integrable martingales. Nagoya J. Math. 30, 209245. Kusuoka, S. (1990). The Malliavin calculus and its applications. Sugaku Expositions 3, Amer. Math. Soc. 127-144. Kusuoka, S. and D. W. Stroock (1985). Applications of the Malliavin calculus, Part II. 9". Fac. Sci. Univ. Tokyo, Sect. IA Math. 32, 1-76. Lamberton, D. and B. Lapeyre (1996). Stochastic Calculus Applied to Finance. Chapman & Hall, London. L6vy, P. (1948). Processus Stochastiques et Mouvement Brownien. Gauthier-Villars, Paris. L6vy, P. (1951). Wiener's random functions and other Laplacian random functions. In 2nd Berkeley Symp. Math. Stat. Prob., pp. 171-187. Univ. Calif. Press, Berkeley. Lions, P. L. and A. S. Sznitman (1984). Stochastic differential equations with reflecting boundary conditions. Comm. Pure Appl. Math. 37, 511 537. Lyons, T. J. (1998). Differential equations driven by rough signals. Revista Matemfitica Iberoamericana, 14(2), 215-310. Malliavin, P. (1997). Stochastic Analysis. Springer, Berlin. Malliavin, P. (1978). G~om~trie DiffOrentielle Stochastique. Les Presses de L'Univ. Montr+al, Montr6al. McKean, H. P. (1969). Stochastic Integrals. Academic Press, New York. Protter, P. (1990). Stochastic Integration and Differential Equations. A New Approach. Springer, Berlin. Revnz, D. and M. Yor (1991). Continuous Martingales and Brownian Motion. Springer, Berlin. Revuz, D. and M. Yor (1999). Continuous Martingales and Brownian Motion. Third Edition, Springer, Berlin. Rogers, L. C. G. and D. Williams (1987). Diffusions, Markov Processes, and Martingales, Vol. 2: Itd Calculus. Wiley, Chichester. Saisho, Y. (1987). Stochastic differential equations for multi-dimensional domain with reflecting boundary. Prob. Theory Related Fields 74, 455-477. Sato, K. (1976). Diffusion processes and a class of Markov chains related to population genetics. Osaka J. Math. 13, 631459. Strooek, D. W. and S. R. S. Varadhan (1972). On the support of diffusion processes with applications to the strong maximum principle. In Sixth Berkeley Symp. Math. Stat. Prob. III, pp. 351-368. Univ. California Press, Berkeley.
933
Stroock, D. W. and S. R. S. Varadhan (1979). Multidimensional Diffusion Processes. Springer, Berlin. Sugita, H. (1988). Positive generalized Wiener functions and potential theory over abstract Wiener spaces. Osaka J. Math. 25, 665 698. Takanobu, S. and S. Watanabe (1988). On the existence and uniqueness of diffusion processes with Wentzell's boundary conditions. J. Math. Kyoto Univ. 28, 71-80. Tanaka, H. (1973). On Markov process corresponding to Boltzmann's equation of Maxwellian gas. In Second Japan-USSR Symp. LNM 330, pp. 473~489. Springer, Berlin. Tanaka, H. (1988). Stochastic differential equations with reflecting boundary condition in convex regions. Hiroshima Math. J. 9, 163-177. Veretennikov, A. Yu. and N. V. Krylov (1976). On explicit formulas for solutions of stochastic differential equations. Math. USSR Sbornik 29, 229356. Watanabe, S. (1984). Lectures on Stochastic Differential Equations and Malliavin Calculus. Tata Institute of Fundamental Research/Springer, Bombay/Berlin. Watanabe, S. (1990). Short time aymptotic problems in Wiener functional integration theory. Applications to heat kernels and index theorems. In Stochastic Analysis' and Related Topics II, Proc. Silivri 1988. LNM 1444, pp. 1-62. Springer, Berlin. Watanabe, S. (1992). Stochastic analysis and its applications. Sugaku Expositions 5, Amer. Math. Soc. 51-69. Watanabe, S. (1997). L~vy's stochastic area formula and Brownian motion on compact Lie groups. In Itd's Stochastic Calculus and Probability Theory, pp. 401-411. Springer, Tokyo. Whitney, H. (1957). Geometric Integration Theory. Princeton University Press, Princeton. Wong, E and M. Zakai (1965). On the relation between ordinary and stochastic differential equations. Intern. J. Engng. Sci. 3, 213-229. Yor, M. (1982). Introduction au calcul stochastique. In Sdminaire de Bourbaki 1981/1982, 590-01-590-18. Yor, M. (ed.) (1997). Exponential Functionals and Principal Values related to Brownian motion. Revista Matemfitica Iberoamericana, Madrid. Yoshida, N. (1992). Asymptotic expansions of maximum likelihood estimators for small diffusions via theory of Malliavin-Watanabe. Prob. Theory Related Fields 92, 275-311. Yoshida, N. (1993). Asymptotic expansions of Bayes estimators for small diffusions. Prob. Theory Related Fields 95, 429~450.
Subject Index
Absolutely continuous process 24, 25 version of the Gaver-Lewis Pareto (I) process 25 Yeh et al. (III) process 25 Absorbing set 823 Adaptive estimation 55, 67 Age at the time t 680, 681 dependent-branching processes 44 replacement policy 488 Age and residual lifetime 131, 132 AIC 264, 272 Aliasing 259-263 problem 250 American call option 804 Analog of a vector Radon-Nikod~m therorern 810 Ancestry 216 Aperiodicity 828 Arbitrage at time t 900 opportunity 803 price 900 Arc-sine law 173, 197 Arrival theorem 337, 338 ARMA Models 148, 249, 258 ARMA(p,q) process 256 263 Arnold Pareto (III) process(es) 20, 21 Associated partial ordering (-~) 331,332 measures 699 negatively (NA) 331,333 Associated random variables arising in reliability, statistical mechanics, percolation theory 695 which are uncorrelated 706 Association for jointly stable laws 702 in time 333, 699 935
Asymptotic efficiency 55 mixed normal (LAMN) family 66 Attenuation fnnction 500 Australian All Ordinaries Index 273 Antocovariance function(s) 250, 252, 253, 255, 257, 258, 261,262 Autoregressive classical Pareto processes 12 model 832, 833, 835, 838, 841 Pareto (III) processes t6 processes 11 Average option 390 Avoidance measure 281-283, 285, 291 B.P.R.E. 47 Ballot theorems 173, 199 Bayes and empirical Bayes methods 55, 68 Bernoulli deletion 619, 624 process 606 Bernstein's theorem 754 Bernstein-type inequality 709 Berry-Esseen type bound 712 Bessel diffusion 919 Bimeasure 449 Binomial model 390 process, s e e Bernoulli process Birth and Death processes 311,332, 685 Birth-death process, generalized 321 Birth-process paced records 301 303 Bivariate records 297, 298 Black-Scholes formula 394 model 393,898, 919 option pricing formula 900 Block replacement policy 488 Bochner's theorem 754 Bonferroni inequality 82, 112
936
Subject index
Choquet-Deny theorem 742, 752 Chung and Fuchs test 137 Class (D) process 780 Classical Pareto distribution 2, 8 Cluster process 620 Coalescent 215, 217, 218 Comparison inequalities 535, 539 Compensator 881,910 Complete process 393 Complex Brownian motion 890 semimartigale 800 Compound Poisson process(es) 88, 93, 103-112, 125, 607, 621,623 Concentration and deviation inequalities 538 Conditional exponential Markov processes 72 Jensen inequality 769 Conformal martingale 889 Contingent claim(s) 390, 792 Continuous component 844 martingale part 885 state branching processes 130 Continuous-time autoregression (CAR) process 249 autoregressive moving average (CARMA) process 249 Continuous-time threshold ARMA (CTARMA) process 250 ARMA process 265 autoregression (CTAR) process 267 Converge weakly 376 Convergence in total variation 828 of Markov chains 827 of measures 227, 229, 230 Convex function, characterization 770 Convolution operator 119 Coordinate function 794 Correlation conjecture 540-542, 575, 576 in space 331 in time 331 inequalities 535, 540, 551,575 Cost functions 490 Countable space chains 820 Coupling 179 approach 88-91, 94, 101 Estimate 89 91, 98, 102, 104, 107 Covariance function 445
Boolean model 628 Borel equivalence 736 Bounded in probability 847 mean oscillation space (BMO) 773 simple predictable process 880 Branching processes 35, 70 random walks 36 Brownian component 126 filtration 897 sheet(s) 563,565, 566, 581,588 Brownian motion 251,253,254, 256, 264, 267, 365, 853 process 784 Brownian motion(s) on Lie groups 928 Riemannian manifold 921 sphere 922 the Lobachevskii plane 923 Bundle of orthonormal frames 923 cfidlfig 376, 875 Calculation of moments 128 Cameron-Martin formula 858 subspace 913 Cameron-Martin-Girsanov formula 265 Campbell's theorem 610 Canonical decomposition 885 isomorphism 923 Capacity function 318, 321,347, 355 CARMA process 254-256, 259 CARMA(p,q) process 251,252 Cauchy law 122 Centered second order process 444 Central limit theorem 154, 711 with random number of terms 240 Change of equivalent measures 804 drift 907 Characteristic functional 685 Characterization of linear growth birth-death process 685 martingales in terms of stopping times 876 stable distributions 744 Characterizations based on order statistics 749 record values 749 Choquet's theorem 738
Subject index
measure 610 Cox processes 381,608, 617 Cox-Ross Rubinstein model 390, 811 Cram~r-Lindberg approximation 374 inequality 374 Model 369 Cross covariance function 456 CTAR process 270, 271-274 CTARMA process 250 CTAR(p) process 265, 267-270 Cumulant measure 610, 618 Cumulants 677 Cumulative damage processes 477 d-dimensional Wiener process 667 Damage model(s) 749, 753, 754 de Finetti's theorem 733 736, 738, 744, 754-756, 758 de Rham-Kodaira Laplacian 928 Decomposable function 512 Demimartingales 717 Demisubmartingale 718 Demisupermartingale 718 Deny's theorem 752-755, 757 Dependence on initial values of solutions 913 Detailed balance condition 827 Differential form 926 Diffusion approximation 377 process(es) 256, 910 Diffusions with reflection 87t Directed indexed martingale 794 Dirichlet problem 870 processes 687 Discrete dam 741 Distributions on the Wiener space 914 DMRL 477 Doeblin decomposition(s) 834, 847 Doeblin's Condition 842 minorization 155 Doltans-Dade measure 787 Domain of attraction for records 293 Doob's upcrossing inequality 719 Doob-Meyer decomposition 769 theorem 779 Double stochastic integral 647 Doubling strategy 765 Doubly stochastic Poisson process 381, 608, 683
937
Drift coefficient 124 Drift condition(s) for convergence of moments 840 existence of moments 829 geometric ergodicity 830, 841 positive recurrence 827, 838 recurrence 824, 837 transience 824, 837 Drift part 885 DS-integral 448 Durbin statistic 65 Dubins-Freedman Splitting Condition 161 Dwass-Rtnyi Lemma 289, 290, 296, 298 Dynkin-Lamperti Theorem 133 Efficient estimation 58 tests 60 Electrical networks 182 Elliptically symmetric distribution 756 Embedded renewal times 839 Embedding 250, 259-263 Empirical survival function 720 Empty space function 632 Equation, Lyapunov 219, 222, 224 Erickson's test 138 Esscher transform 384 European call and put 390 European call option 804, 899 Events prior to T, 776 Ewens sampling formula 297 Exchangeable property 678 sequence 733, 736, 758-760 Exercise price K 390, 899 Existence and uniqueness results for SDE 905 Explosion time 905 Exponential process(es) 16, 799 (semi)martingales 797 Exponential and iterated integral of semimartingale 893 Extended Black-Scholes model 810 Exterior product of cotangent bundle 926 Extremal process 294, 295 Extreme points 738 Extreme-value distributions 292 Factorial moment measure 610, 618 Feller property 119 Feynman-Kac formula 130, 857 Filter 458 Filtration 371,874, 896 Financial market models 803
938
Subject index
Genetic disease 233 drift 215 models 919 Geometric Brownian motion 393 ergodicity 829, 840 Harris ergodicity 159 hitting times 830, 841 maximization 3 minima, maxima 19 minimization 3, 20 sums 6 Gibbs process 628 sampler 150 sampling 819 Girsanov formula 866 transformation 806 Girsanov's theorem 896 Glivenko-Cantelli type theorem 722 Good measure 523 set 513 Gordon-Newell network 315, 318, 333 Group(s) 177, 180 Group-Local-Balance (GLB) 323, 325, 353 Hardy space 773 Harmonic analysis 185 Harris recurrence 157 Harris recurrent chain 836 set 836 Hastings-Metropolis algorithm 836 Hazard (rate) function 283, 293, 498 Heat kernel 926 Hedge 803 portfolio 391 Hedging 390 Hermite polynomials 667, 895 HNBUE 478 HGlder continuity 139 Hoeffding identity 694 Homogeneous hypergroups 189 Poisson process 367, 632 process with independent increments 644 spaces 189 #-hormonic exponential function 740 IBM closing stock prices Identifiability 643 271
Finite-dimensional distribution 614 First entrance (passage) time 776 return time 822, 833 Fixed discontinuity 777 FKG inequalities 697 Fluctuation identities 134 probabilities 14, 16, 20, 25 theory 133 Follmer-Schweizer decomposition 403 Forward recurrence time at the time t 681 Foster's Condition 827 Foster-Tweedie drift conditions 158 Fourier transform 255, 259 Fractal dimensions 141 Fractional Brownian motion(s) 547, 551,556, 560 563, 574, 577, 581,585, 587, 590 sheet(s) 554, 563, 566 Fundamental identity of sequential analysis 777 theorem of (stochastic) calculus 799
Gamma
law 122 process(es) 504, 506, 661 random measure 614, 617, 618 Gaussian likelihood 253, 270 process(es) 663, 670, 673, 687 sequence 672 stationary Markov process 687 Gauss-Poisson processes 626, 677 Gaver-Lewis exponential process 15 Pareto (I) 25 Pareto process 14-16, 25 Gelfand pair(s) 186, 189 General boundedness principle 784 exponential servers 351 state space chains 831 Generalization of the Dubins and Freedman splitting condition 161 Generalized expectation 915 It6 fornmla 790 Jordan decompostion 782 Wiener functionals 914 Generating functional 615 618 Generators 856
Subject index
939
IFR 473 IFRA 474 Ignatov's Theorem 278-280, 284-286, 292 Image law 772 I-martingale 812 Increasing process 876 Independent and stationary increments 118 Independent of the past history 902 Inference for CTAR processes 270-275 linear CARMA processes 263, 264 Infinitely divisible process 121 Inhomogeneous Poisson process 478,482, 607, 616, 635, 675 Initial distribution 903 law 903 rank 279, 290-292 value 903 Insensitivity 346, 350, 357 Insurance derivatives 406, 448 Integral functionals 127 Integrated Cauchy functional equation 734, 736 Intensity function 608 measure 609 see point process intensity Intrinsic age 504 Invariance principle 714 Invariant distributions 826 measures 837 vectors 826 Invertibility 254 Irreducibility 822, 833 measure 833, 835 p-irreducibility 833 Irreducible chain 823 Irregularly spaced data 249, 263 Isoperimetric inequality 536, 537, 546 Iterated i.i.d, maps 146 It6 calculus 859 differential formula 789 formula in stochastic differentials 888 formula in the complex form 890 integral 788 process 886 Jackson network 313,318, 333, 345 Janson's inequality 92-94, 111 Jordan decomposition 775, 782
Karamata-Stirling law 297 Karhunen process 453 Karhunen-Lo6ve (K-L) expansion 671 representation 795 Kernel-type estimator 724 k-dimensional Gaver-Lewis process 26 logistic distributions 7 Yeh-Arnold-Robertson process 26 K-function 632 k-fold quasi-Poisson process 676 k-variate Arnold process 27 Semi-Pareto process 27 KIT 455, 461,462, 464, 466 Knight's representation theorem 893 Kolmogorov Isomorphism Theorem 446 semigroup 221 Kolmogorov-Cram+r integral 785 kth order Arnold process 24 Gaver-Lewis Pareto process 23 Lawrence-Lewis Pareto (I) process 22 Yeh et al. process 23 LP'P-bounded process 784 L e~,q'2-bounded process 785 L (P)-bonnded martingale 776 Ladder epoch 195 Ladder height 195 measure(s) 750, 751 Langevin's equation 919 Laplace exponent 124 functional 616 Laplace-Beltrami operator 921 L-approximation number(s) 553, 554, 566, 576 Large deviations 39 Lau-Rao theorem 745 Law(s) of the iterated logarithm 576, 577, 581, 585, 591, 717 Lawrence-Lewis classical Pareto process 12, 13 Pareto (I) process 14 process 25 Least squares estimation 455 Left continuous process 777 Level crossing processes 19 Leverage effect 397 L6vy
940
Subject index
property 118, 145 renewal process 302, 414, 424 Markov chain(s) 221,413,414, 416, 417, 424, 425, 427, 435 asymptotically stable 223, 225, 231 denumerable 216 ergodic 223, 231 Monte carlo 630, 819 Markov transition law 832 matrix 820 Markovian minimization 28 type 902, 903 Martingale(s) 283, 663, 875 characterization 665 convergence theorem 372, 775 difference 773 formulation 866 measure 809 stopping theorem 372 structures 666 Maruyama-Girsanov transformation 896, 907 Maturity 390, 898 Max-geometric stability 3, 6 Maximal inequalities 771 Maximum likelihood estimates 263 Mean (expected) backward recurrence time at the time t 681 measure 614 Mean-variance hedging 402 Measurable couple 523 process 779 Metric entropy 537, 545, 546, 554, 558, 587 Metropolis algorithm 820, 823, 826, 831 Metropolis-Hastings algorithm(s) 819, 832, 834, 838, 841,843 Migration process generalized 321-322 processes 317, 322, 325 Min-geometric stability 3, 6, 7 Mininaal hedge 803 repair replacement policy 489 stationary dilation 454 Minimum phase 254 Mixed Benoulli process 623 finite exchangeable process 678 network 334, 354, 356 Poisson process 379, 608, 678
flights 117 measure 121, 122 process(es) 117, 118, 255, 256, 366, 501 L6vy's martingale characterization theorem for Wiener process 877 L6vy-BM 813 L6vy-driven CARMA process 255 L~vy-It6 decomposition 127 L~vy-Khintchine formula 121, 126 Likelihood methods 56 ratio statistic 62, 63 Linear Lie groups 928 processes 662 Linear and exponential diffusion 918 Linear/nonlinear autoregressive models 147 Link 517 Linked component 518 Lipschitz condition 905 Local approach 85 88, 100 balance 323, 336, 353 conformal martingale 889 martingale 876, 783 minorization 157 semimartingale 783 queue length 313, 316 Local asymptotic mixed normal (LAMN) family 66 normality (LAN) 55, 57 Localized splitting 162 Locus disease 233 marker 233 Log-concave density 473 Longitudinal data 74 Loop 519 Lower records 278 Lundberg coefficient 373 Luzin square function 773 Lyapunov function 827 M/G/1 queue 821,823, 825, 827, 829, 830 Maintenance policy 488 Malliavin calculus 913 covariance 914 Mardia's distribution 4 Marked point process 288, 298, 299, 623, 628 Markov point process 628 processes 69
Subject index
941
p-mixed renewal process 678 Mixed sample case 49 process 683 Model Fisher-Wright-Moran 217 Infinite Sites 225 Model Stepwise Mutation Generalized 216 Restricted 233 Modular spectral domain 461,462, 464, 466 time domain 461,464 Moment bounds 706 measure 614, 618 Monotone partial order (-<) 328, 329, 332 couplings 90, 91, 94 process 328 Moving discontinuity 777 Multidimentional Black Sholes problem 814 Multiparameter Wiener process 687 Multiple points 142 transitions 322 Multiserver queues 312 Multitype case 36 point process 625 Multivariate exponential distributions 8 extreme distributions 6 Multivariate Pareto (II) conditionals distribution 11 (III) conditionals distribution 11 distributions 4 processes 26 Mutation, Markov 217 natural filtration 877 NBU 475 NBUC 476 NBUE 476 n-cyclically exchangeable sequence 740 NEAR process 13 Nearest-neighbour distribution function 632 Negative association 697 binomial process 380, 661 Net-profit condition 369 Neyman C(c0-statistic 65 Nilpotent Lie group and L6vy's Stochastic area integral 929
Non-anticipating function 902 Nondecreasing functions of associated random variables 704 Non-degenerate in the sense of Malliavin 914 Non-ergodic models 65 Nonhomogeneous Poisson process, see inhomogeneous Poisson process Nonlinear CAR models 264-270 systems 819 time series models 249 Nonparametric density estimation 724 estimation for survival function 720 failure rate estimation 726 mean residual life function estimation 727 Normal Hilbert B(H)-module 463 Novikov's condition 894 n-step transition law 832 Null recurrent chain 826, 837 Obliquely reflecting Brownian motion 917 Observation and state equations 251 Occupation time 822, 833 Open-set irreducible 846 Operational time 370 Operator covariance function 460 semivariation 465 stationary 461 stationary dilation 465 Optimal/optimum estimating functions 55, 66 Optional a-field 875 sampling 777 Order closed 328 normal 328 partial sum 329, 330 Order statistic(s) 285, 286, 287, 288, 289, 297 property 683 Original It6 formula 790 Ornstein-Uhlenbeck process 254 Brownian motion 919 Orthogonally scattered 445 dilation 454, 786 Outstanding observations, see record sequences Overtak~free paths 339-341,343 Pairwise interaction process 629 Palm distribution 605 Pareto 5 multivariate distributions 5
942
Subject index paced records 299-301 probability generating functional 616 process 96-104, 110, 112, 125, 281,282, 285, 291,294, 295, 299-301,303, 304, 665, 674, 675, 256, 284, 288, 619, 620, 622, 623, 628, 660, 665, 666, 883, 910 simulation 623 two-type 625 with mean measure/~(.) 611 Poisson Arrivals, see Time Averages (PASTA) 337, 345 Poissonian filtration 898 Polar sets 141 Pdlya's theorem 171, 174 Polymerase chain reaction 40 Population decaying 227, 237 expanding 228, 238 genetics 216 stable 225, 237 substructure 231 Portfolio 899 Positive recurrent chain 826, 837 upper orthant dependent (PUOD) 331,333 Potential theory 118, 136, 177, 182 Predictable a-algebra~a-field 779, 875 increasing process 779 joint 624 process 779 Probabilistic representations of heat kernels by Wiener functional expectations 925 Probability generating functional 616, 685 Process of bounded variation 781,877 with independent linear forms 673 Processes determined up to shift 648 with associated increments 698 Product density 608 form distribution 314, 316, 321,352 integral 283, 284 tensor 222 Progressively measurable 875 Pure birth process 73 jump wear processes 479 Quadratic variation of the process covariation 790 789
processes 1, 11 (I) distribution (classical Pareto distribution) 1 Pareto (II) conditionals 9, 31, 32 distribution 1, 4, 5 multivariate distributions 5, 9, 10 Pareto (III) conditionals 9 distribution 1, 3, 5 8, 10 Pareto (IV) multivariate distribution 5 Pareto conditional distributions 8, 11 Partial balance equation 353 records 278-280 Pathwise uniqueness 903 Payoff 899 Phase-type distribution 347, 348 Physically unrealizable 766 Piecewise deterministic Markov processes 386 Pinned Brownian motion 919 Planar martingale 812 Point function 880 Point process(es) 102, 495, 599, 603, 666, 675, 880 completely random 606 deletion 619 factorial moment measure 610 finite-dimensional distribution 602 intensity 605, 609 intensity measure 609 limit results 622 mean measure 609 moment measure 609 product density 608 random translation 620 simple 604 simulation 635 state space 601 stationary 604 statistical inference 631-635 superposifion 618 thinning 619 two type 624 Poisson cluster process 621,626 dependent 625 general 611 homogeneous 606 infinitely divisible 626 in/nonhomogeneous 607 Local Estimate 86-88, 93, 98, 101-104, 107, 109, 110
Subject index
943
Quantile hedging 401 Quasimartingale 781 Quasi-score estimators 67 function 66 Quasi-sure analysis 927 Queue length process 348, 349 Queues 199 Queuing lengths 149 Rademacher functions 528 Random coefficient autoregressive processes 73 environments 498 generation of associated sequences 719 graph 82, 89, 93, 110 iterated quadratic maps 152 measure 613 615 permutations 29(~298 record processes 298, 299 sums 304, 305 time change 866 walk 171, 173, 174, 414, 419, 429M32, 435-437, 439, 440, 845, 846, walks 413, 419, 423,429 Rate(s) of convergence 829, 831,840 Rational (or fair) market problem 808 Reachable states 846 Recombination 234 Record sequences 277-308 times 277, 289, 290, 296, 300 Recurrence 173, 174, 823 Recurrent chain 824 L~vy process 136, 137 set 836 state 824 Reducibility 822 Reflecting Brownian motion 916, 917 Regeneration 413,414, 420, 421 set 131 Regular variation 303 Renewal paced records 303 process 303-305, 607, 675, 680 processes 383,487 sequence 425 theorem t32, 744 theory 131, 180 Renewals 413-427, 429 436, 439 Replacement policies 483 Replicating strategy 899 Representation
of a continuous martingale by a Wiener process 893 theorem for conformal martingale by complex Brownian motion 893 Representing measure 445, 452 Residual life at the time t 680, 681 waiting time at the time t 681 Resolvent of the Lyapunov semigroup 225 operators 120 Reverse martingale 664 Reversible chains 827 Riemanu 180 Riemannian manifolds 188 Riskless interest rate 898 Risk-neutral pricing rule 393 Rosenthal-type moment inequalities 709 Ruin probabilities 369 time 369 Safety loading 369 Sample process, see Bernoulli process Scalar covariance function 460 Scaling limit of critical Galton-Watson branching processes 919 Score statistics 63 SDE 393 based on Gaussian and Poissonian noise 909 for L2'2-bounded processes 791 on manifold 920 SDE's based on Gaussian white noise (Wiener noise) 902 Second order differential operator of the H6rmander type 921 Markovian minimization 30 Second-order differential operator 911 Secretary problems 306 Selection property 733 Self-financing 392, 803 portfolio 899 strategy 803 Self-intersection 141 Semigroup(s) 119, 856 Lyapunov 223, 224 Markov 223 of operators 219 Semimartingale 783, 884 Semiparametric decomposition 885 models 55, 67 Semi-Pareto
944
Subject index
Stable processes 256, 652, 659 Standard filtration 768 State space representation 264, 256, 819 Stationary dilation 453 processes 445, 686 Stein Equation 84, 98, 105, 108 Stein-Chen method 84.91, 98 Stepwise mutation model 235 Stochastic calculus 783 flow of diffeomorphisms 913 intervals 779 monotonicity 831, 843 moving frame 923 order, strong 328 volatility models 396 Stochastic differential of the Stratonovich type 888 Stochastic differential equations 863, 901 equations on manifolds 921 Stochastic integral 251,446, 878, 882, 889 by a semimartingale 884 Stratonovich type 889 It6 type 889 Stochastic integrals 644, 858 for semimartingales 788 Stock drift 898 volatility 898 Stopped process 876 Stopping time 371,776, 874 Strategy 392, 803, 899 Stratonovich integral 793 stochastic differential 888 Strauss process 630 Strictly stable laws 122 Strike price 899 Strong approximation 292-295, 304 invariance principle 715 law of large numbers 710 martingale 812 memoryless characterizations 749 solutions 903 Strong Feller chain 844 Property 120 Stronger exchangeable property 678 Strongly continuous components 843 harmonizable 451,461
distributions 4, 7 (p) distribution 27 processes 27 Semi-stable Markov processes 130 Semivariation 447, 449 Seneta-Heyde constants 38 Service discipline, general 347 349, 351,353, 355 Service system, general 346, 353 Shanbhag's lemma 748 Shift protocol 334 Shorrock's Theorem 282, 285 Shot-noise process 500, 627 Signal plus noise model 457 Simple Lebesgue spectrum 511 Simple random walk 820, 825 Simplex 738 Single server queue 311 Skorokhod equation 916 in the multidimensional case 917 Slow points 140 Small ball constant(s) 566, 567, 569, 573, 574, 576, 590 estimate(s) 533, 535, 547, 566, 576, 587, 590, 591 probability 534.536, 541-543, 545, 546, 554, 556, 558, 561,565, 566, 570, 576, 581, 582, 584, 585, 587-589 problems 534, 537, 539 Small counts 91-96 Small sets 834 Smoothness for the density of laws of Wiener functionals 915 Sobolev weak derivative 914 Sojourn time distribution 341,343-345 Sojourns 284-289 Solution to the SDE 902 Solutions for initial or boundary value problems 912 Spectral bimeasure 454 characteristic 457, 459 density 253,257, 262 domain 446, 454 measure 445 Spent time at the time t 681 Spherically symmetric distribution 756, 759 Spitzer's identity 188, 193 Spitzer-Rogozin's criterion 138 Splitting 839 Spread out random walks 845 Square of Bessel processes 919
Subject index
Subexponential distributions 375 Subhomogeneous function 482 Submartingale 371,768 Subordinated distributions 303 Subordinator 124 Suen's inequality 93 Sunspot numbers 263, 272 Super martingale 371,768 processes 35 replication 400 Superadditive 400, 479 Superposition of renewal processes 682 Supplemented queue length process 348, 349 Symmetric measure 735, 737 server 349, 351,355 stable process 654 System of canonical horizontal vector fields 924 Tauberian theorem 547, 548, 590 T-chain 844 Terminally uniformly integrable 794 Test Wiener functionals 914 Tests for change point problem for associated random variables 727 Thinning 298, 299, 493,635 Threshold ARMA 249 Time change 892, 908 domain 445 series 72, 249 Topological positive recurrence 846 recurrence 846 spaces 843 transience 847 Total variation norm 839 Trading port folio 803 Traffic equation 314, 316 Transformation of drift 896 Transience 137, 173, 174, 823 Transient 136, 137 chain 824 L6vy process 136, 137 state 824 Transition matrix 822 probabilities 817 Tsirelson's example 908 Twisted joining 525 Two-type Poisson process 625, 626 Type process 348, 349 Uniform ergodicity 842 order statistic property 683 Uniformly integrable 777 transient set 836 Unique invariant probability 159 strong solution 904 Uniquely linked component 519 points 518 Uniqueness in the sense of law 903 Up-Down criterion 332 Usual conditions 874 Variation 449 function 610 Vector markov process 814 measure 786 time domain 460 Very good measure 523 Virtual age 496 Vitali-Nikod)m theorem 775 Vito Pareto 1 Volatility 393 parameter 806
945
Wald identity 778 statistic 62 test 63 Walsh functions 528 Weak association 698 Feller chain 844 martingale 812 Weakly harmonizable 451,461 spectral measure 786 Weakly harmonizably correlated 456 Weakly operator harmonizable 465 Wear processes 480 Wiener functionals 913 martingale 877, 910 measure 854 process 294, 304, 650, 655, 657, 663,666 space 913 Wiener process (Brownian motion) 877 Wiener-BM 813 Wiener-Hopf factorization 134
946 Without arbitrage opportunity 900
Subject index
Yor's formula Zinger result 800 757
Yeh-Arnold-Robertson Pareto (III) processes 17, 18, 25
Handbook of Statistics Contents of Previous Volumes
Volume 1. Analysis of Variance Edited by P. R. Krishnaiah 1980 xviii + 1002 pp.
1. Estimation of Variance Components by C. R. Rao and J. Kleffe 2. Multivariate Analysis of Variance of Repeated Measurements by N. H. Timm 3. Growth Curve Analysis by S. Geisser 4. Bayesian Inference in MANOVA by S. J. Press 5. Graphical Methods for Internal Comparisons in ANOVA and MANOVA by R. Gnanadesikan 6. Monotonicity and Unbiasedness Properties of ANOVA and MANOVA Tests by S. Das Gupta 7. Robustness of ANOVA and MANOVA Test Procedures by P. K. Ito 8. Analysis of Variance and Problem under Time Series Models by D. R. Brillinger 9. Tests of Univariate and Multivariate Normality by K. V. Mardia 10. Transformations to Normality by G. Kaskey, B. Kolman, P. R. Krishnaiah and L. Steinberg 11. ANOVA and MANOVA: Models for Categorical Data by V. P. Bhapkar 12. Inference and the Structural Model for ANOVA and MANOVA by D. A. S. Fraser 13. Inference Based on Conditionally Specified ANOVA Models Incorporating Preliminary Testing by T. A. Bancroft and C. -P. Han 14. Quadratic Forms in Normal Variables by C. G. Khatri 15. Generalized Inverse of Matrices and Applications to Linear Models by S. K. Mitra 16. Likelihood Ratio Tests for Mean Vectors and Covariance Matrices by P. R. Krishnaiah and J. C. Lee
947
948 17. 18. 19. 20. 21. 22. 23. 24. 25.
Contents of previous volumes
Assessing Dimensionality in Multivariate Regression by A. J. Izenman Parameter Estimation in Nonlinear Regression Models by H. Bunke Early History of Multiple Comparison Tests by H. L. Hatter Representations of Simultaneous Pairwise Comparisons by A. R. Sampson Simultaneous Test Procedures for Mean Vectors and Covariance Matrices by P. R. Krishnaiah, G. S. Mudholkar and P. Subbiah Nonparametric Simultaneous Inference for Some MANOVA Models by P. K. Sen Comparison of Some Computer Programs for Univariate and Multivariate Analysis of Variance by R. D. Bock and D. Brandt Computations of Some Multivariate Distributions by P. R. Krishnaiah Inference on the Structure of Interaction in Two-Way Classification Model by P. R. Krishnaiah and M. Yochmowitz
Volume 2. Classification, Pattern Recognition and Reduction of Dimensionality Edited by P. R. Krishnaiah and L. N. Kanal 1982 xxii + 903 pp.
1. Discriminant Analysis for Time Series by R. H. Shumway 2. Optimum Rules for Classification into Two Multivariate Normal Populations with the Same Covariance Matrix by S. Das Gupta 3. Large Sample Approximations and Asymptotic Expansions of Classification Statistics by M. Siotani 4. Bayesian Discrimination by S. Geisser 5. Classification of Growth Curves by J. C. Lee 6. Nonparametric Classification by J. D. Broffitt 7. Logistic Discrimination by J. A. Anderson 8. Nearest Neighbor Methods in Discrimination by L. Devroye and T. J. Wagner 9. The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis by G. J. McLachlan 10. Graphical Techniques for Multivariate Data and for Clustering by J. M. Chambers and B. Kleiner 11. Cluster Analysis Software by R. K. Blashfield, M. S. Aldenderfer and L. C. Morey 12. Single-link Clustering Algorithms by F. J. Rohlf 13. Theory of Multidimensional Scaling by J. de Leeuw and W. Heiser 14. Multidimensional Scaling and its Application by M. Wish and J. D. Carroll 15. Intrinsic Dimensionality Extraction by K. Fukunaga
Contents' of previous volumes
949
16. Structural Methods in Image Analysis and Recognition by L. N. Kanal, B. A. Lambird and D. Lavine 17. Image Models by N. Ahuja and A. Rosenfeld 18. Image Texture Survey by R. M. Haralick 19. Applications of Stochastic Languages by K. S. Fu 20. A Unifying Viewpoint on Pattern Recognition by J. C. Simon, E. Backer and J. Sallentin 21. Logical Functions in the Problems of Empirical Prediction by G. S. Lbov 22. Inference and Data Tables and Missing Values by N. G. Zagoruiko and V. N. Yolkina 23. Recognition of Electrocardiographic Patterns by J. H. van Bemmel 24. Waveform Parsing Systems by G. C. Stockman 25. Continuous Speech Recognition: Statistical Methods by F. Jelinek, R. L. Mercer and L. R. Bahl 26. Applications of Pattern Recognition in Radar by A. A. Grometstein and W. H. Schoendorf 27. White Blood Cell Recognition by E. S. Gelsema and G. H. Landweerd 28. Pattern Recognition Techniques for Remote Sensing Applications by P. H. Swain 29. Optical Character Recognition - Theory and Practice by G. Nagy 30. Computer and Statistical Considerations for Oil Spill Identification by Y. T. Chinen and T. J. Killeen 31. Pattern Recognition in Chemistry by B. R. Kowalski and S. Wold 32. Covariance Matrix Representation and Object-Predicate Symmetry by T. Kaminuma, S. Tomita and S. Watanabe 33. Multivariate Morphometrics by R. A. Reyment 34. Multivariate Analysis with Latent Variables by P. M. Bentler and D. G. Weeks 35. Use of Distance Measures, Information Measures and Error Bounds in Feature Evaluation by M. Ben-Bassat 36. Topics in Measurement Selection by J. M. Van Campenhout 37. Selection of Variables Under Univariate Regression Models by P. R. Krishnaiah 38. On the Selection of Variables Under Regression Models Using Krishnaiah's Finite Intersection Tests by J. L Schmidhammer 39. Dimensionality and Sample Size Considerations in Pattern Recognition Practice by A. K. Jain and B. Chandrasekaran 40. Selecting Variables in Discriminant Analysis for Improving upon Classical Procedures by W. Schaafsma 41. Selection of Variables in Discriminant Analysis by P. R. Krishnaiah
950
Volume 3. Time Series in the Frequency D o m a i n Edited by D. R. Brillinger and P. R. Krishnaiah 1983 xiv + 485 pp.
1. Wiener Filtering (with emphasis on frequency-domain approaches) by R. J. Bhansali and D. Karavellas 2. The Finite Fourier Transform of a Stationary Process by D. R. Brillinger 3. Seasonal and Calender Adjustment by W. S. Cleveland 4. Optimal Inference in the Frequency Domain by R. B. Davies 5. Applications of Spectral Analysis in Econometrics by C. W. J. Granger and R. Engle 6. Signal Estimation by E. J. Hannan 7. Complex Demodulation: Some Theory and Applications by T. Hasan 8. Estimating the Gain of a Linear Filter from Noisy Data by M. J. Hinich 9. A Spectral Analysis Primer by L. H. Koopmans 10. Robust-Resistant Spectral Analysis by R. D. Martin 11. Autoregressive Spectral Estimation by E. Parzen 12. Threshold Autoregression and Some Frequency-Domain Characteristics by J. Pemberton and H. Tong 13. The Frequency-Domain Approach to the Analysis of Closed-Loop Systems by M. B. Priestley 14. The Bispectral Analysis of Nonlinear Stationary Time Series with Reference to Bilinear Time-Series Models by T. Subba Rao 15. Frequency-Domain Analysis of Multidimensional Time-Series Data by E. A. Robinson 16. Review of Various Approaches to Power Spectrum Estimation by P. M. Robinson 17. Cumulants and Cumulant Spectral Spectra by M. Rosenblatt 18. Replicated Time-Series Regression: An Approach to Signal Estimation and Detection by R. H. Shumway 19. Computer Programming of Spectrum Estimation by T. Thrall 20. Likelihood Ratio Tests on Covariance Matrices and Mean Vectors of Complex Multivariate Normal Populations and their Applications in Time Series by P. R. Krishnaiah, J. C. Lee and T. C. Chang
951
Volume 4. Nonparametric Methods Edited by P. R. Krishnaiah and P. K. Sen 1984 xx + 968 pp.
1. Randomization Procedures by C. B. Bell and P. K. Sen 2. Univariate and Multivariate Mutisample Location and Scale Tests by V. P. Bhapkar 3. Hypothesis of Symmetry by M. Hugkovfi 4. Measures of Dependence by K. Joag-Dev 5. Tests of Randomness against Trend or Serial Correlations by G. K. Bhattacharyya 6. Combination of Independent Tests by J. L. Folks 7. Combinatorics by L. Takfics 8. Rank Statistics and Limit Theorems by M. Ghosh 9. Asymptotic Comparison of T e s t s ~ Review by K. Singh 10. Nonparametric Methods in Two-Way Layouts by D. Quade 11. Rank Tests in Linear Models by J. N. Adichie 12. On the Use of Rank Tests and Estimates in the Linear Model by J. C. Aubuchon and T. P. Hettmansperger 13. Nonparametric Preliminary Test Inference by A. K. Md. E. Saleh and P. K. Sen 14. Paired Comparisons: Some Basic Procedures and Examples by R. A. Bradley 15. Restricted Alternatives by S. K. Chatterjee 16. Adaptive Methods by M. Hugkovfi 17. Order Statistics by J. Galambos 18. Induced Order Statistics: Theory and Applications by P. K. Bhattacharya 19. Empirical Distribution Function by E. Cs/tki 20. Invariance Principles for Empirical Processes by M. Cs6rg6 21. M-, L- and R-estimators by J. Jure~kov/t 22. Nonparametric Sequantial Estimation by P. K. Sen 23. Stochastic Approximation by V. Dupa~ 24. Density Estimation by P. R~v6sz 25. Censored Data by A. P. Basu 26. Tests for Exponentiality by K. A. Doksum and B. S. Yandell 27. Nonparametric Concepts and Methods in Reliability by M. Hollander and F. Proschan 28. Sequential Nonparametric Tests by U. Mtiller-Funk 29. Nonparametric Procedures for some Miscellaneous Problems by P. K. Sen 30. Minimum Distance Procedures by R. Beran 31. Nonparametric Methods in Directional Data Analysis by S. R. Jammalamadaka 32. Application of Nonparametric Statistics to Cancer Data by H. S. Wieand
952
33. Nonparametric Frequentist Proposals for Monitoring Comparative Survival Studies by M. Gail 34. Meterological Applications of Permutation Techniques based on Distance Functions by P. W. Mielke, Jr. 35. Categorical Data Problems Using Information Theoretic Approach by S. Kullback and J. C. Keegel 36. Tables for Order Statistics by P. R. Krishnaiah and P. K. Sen 37. Selected Tables for Nonparametric Statistics by P. K. Sen and P. R. Krishnaiah
Volume 5. Time Series in the Time D o m a i n Edited by E. J. Hannan, P. R. Krishnaiah and M. M. R a o 1985 xiv + 490 pp.
1. Nonstationary Autoregressive Time Series by W. A. Fuller 2. Non-Linear Time Series Models and Dynamical Systems by T. Ozaki 3. Autoregressive Moving Average Models, Intervention Problems and Outlier Detection in Time Series by G. C. Tiao 4. Robustness in Time Series and Estimating ARMA Models by R. D. Martin and V. J. Yohai 5. Time Series Analysis with Unequally Spaced Data by R. H. Jones 6. Various Model Selection Techniques in Time Series Analysis by R. Shibata 7. Estimation of Parameters in Dynamical Systems by L. Ljung 8. Recursive Identification, Estimation and Control by P. Young 9. General Structure and Parametrization of ARMA and State-Space Systems and its Relation to Statistical Problems by M. Deistler 10. Harmonizable, Cram6r, and Karhunen Classes of Processes by M. M. Rao 11. On Non-Stationary Time Series by C. S. K. Bhagavan 12. Harmonizable Filtering and Sampling of Time Series by D. K. Chang 13. Sampling Designs for Time Series by S. Cambanis 14. Measuring Attenuation by M. A. Cameron and P. J. Thomson 15. Speech Recognition Using LPC Distance Measures by P. J. Thomson and P. de Souza 16. Varying Coefficient Regression by D. F. Nicholls and A. R. Pagan 17. Small Samples and Large Equation Systems by H. Theil and D. G. Fiebig
953
Volume 6. Sampling Edited by P. R. Krishnaiah and C. R. R a o 1988 xvi + 594 pp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
21. 22. 23. 24.
A Brief History of Random Sampling Methods by D. R. Bellhouse A First Course in Survey Sampling by T. Dalenius Optimality of Sampling Strategies by A. Chaudhuri Simple Random Sampling by P. K. Pathak On Single Stage Unequal Probability Sampling by V. P. Godambe and M. E. Thompson Systematic Sampling by D. R, Bellhouse Systematic Sampling with Illustrative Examples by M. N. Murthy and T. J. Rao Sampling in Time by D. A. Binder and M. A. Hidiroglou Bayesian Inference in Finite Populations by W. A. Ericson Inference Based on Data from Complex Sample Designs by G. Nathan Inference for Finite Population Quantiles by J. Sedransk and P. J. Smith Asymptotics in Finite Population Sampling by P. K. Sen The Technique of Replicated or Interpenetrating Samples by J. C. Koop On the Use of Models in Sampling from Finite Populations by I. Thomsen and D. Tesfu The Prediction Approach to Sampling theory by R. M. Royall Sample Survey Analysis: Analysis of Variance and Contingency Tables by D. H. Freeman, Jr. Variance Estimation in Sample Surveys by J. N. K. Rao Ratio and Regression Estimators by P. S. R. S. Rao Role and Use of Composite Sampling and Capture-Recapture Sampling in Ecological Studies by M. T. Boswell, K. P. Burnham and G. P. Patil Data-based Sampling and Model-based Estimation for Environmental Resources by G. P. Patil, G. J. Babu, R. c. Hennemuth, W. L. Meyers, M. B. Rajarshi and C. Taillie On Transect Sampling to Assess Wildlife Populations and Marine Resources by F. L. Ramsey, C. E. Gates, G. P. Patil and C. Taillie A Review of Current Survey Sampling Methods in Marketing Research (Telephone, Mall Intercept and Panel Surveys) by R. Velu and G. M. Naidu Observational Errors in Behavioural Traits of Man and their Implications for Genetics by P. V. Sukhatme Designs in Survey Sampling Avoiding Contiguous Units by A. S. Hedayat, C. R. Rao and J. Stufken
954
Volume 7. Quality Control and Reliability Edited by P. R. Krishnaiah and C. R. R a o 1988 xiv + 503 pp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23.
Transformation of Western Style of Management by W. Edwards Deming Software Reliability by F. B. Bastani and C. V. Ramamoorthy Stress-Strength Models for Reliability by R. A. Johnson Approximate Computation of Power Generating System Reliability Indexes by M. Mazumdar Software Reliability Models by T. A. Mazzuchi and N. D. Singpurwalla Dependence Notions in Reliability Theory by N. R. Chaganty and K. Joag-dev Application of Goodness-of-Fit Tests in Reliability by H. W. Block and A. H. Moore Multivariate Nonparametric Classes in Reliability by H. W. Block and T. H. Savits Selection and Ranking Procedures in Reliability Models by S. S. Gupta and S. Panchapakesan The Impact of Reliability Theory on Some Branches of Mathematics and Statistics by P. J. Boland and F. Proschan Reliability Ideas and Applications in Economics and Social Sciences by M. C. Bhattacharjee Mean Residual Life: Theory and Applications by F. Guess and F. Proschan Life Distribution Models and Incomplete Data by R. E. Barlow and F. Proschan Piecewise Geometric Estimation of a Survival Function by G. M. Mimmack and F. Proschan Applications of Pattern Recognition in Failure Diagnosis and Quality Control by L. F. Pau Nonparametric Estimation of Density and Hazard Rate Functions when Samples are Censored by W. J. Padgett Multivariate Process Control by F. B. Alt and N. D. Smith QMP/USP-A Modern Approach to Statistical Quality Auditing by B. Hoadley Review About Estimation of Change Points by P. R. Krishnaiah and B. Q. Miao Nonparametric Methods for Changepoint Problems by M. Cs6rg6 and L. Horv/tth Optimal Allocation of Multistate Components by E. E1-Neweihi, F. Proschan and J. Sethuraman Weibull, Log-Weibull and Gamma Order Statistics by H. L. Herter Multivariate Exponential Distributions and their Applications in Reliability by A. P. Basu
955
24. Recent Developments in the Inverse Gaussian Distribution by S. Iyengar and G. Patwardhan
Volume 8. Statistical Methods in Biological and Medical Sciences Edited by C. R. R a o and R. C h a k r a b o r t y 1991 xvi + 554 pp.
1. Methods for the Inheritance of Qualitative Traits by J. Rice, R. Neuman and S. O. Moldin 2. Ascertainment Biases and their Resolution in Biological Surveys by W. J. Ewens 3. Statistical Considerations in Applications of Path Analytical in Genetic Epidemiology by D. C. Rao 4. Statistical Methods for Linkage Analysis by G. M. Lathrop and J. M. Lalouel 5. Statistical Design and Analysis of Epidemiologic Studies: Some Directions of Current Research by N. Breslow 6. Robust Classification Procedures and Their Applications to Anthropometry by N. Balakrishnan and R. S. Ambagaspitiya 7. Analysis of Population Structure: A Comparative Analysis of Different Estimators of Wright's Fixation Indices by R. Chakraborty and H. DankerHopfe 8. Estimation of Relationships from Genetic Data by E. A. Thompson 9. Measurement of Genetic Variation for Evolutionary Studies by R. Chakraborty and C. R. Rao 10. Statistical Methods for Phylogenetic Tree Reconstruction by N. Saitou 11. Statistical Models for Sex-Ratio Evolution by S. Lessard 12. Stochastic Models of Carcinogenesis by S. H. Moolgavkar 13. An Application of Score Methodology: Confidence Intervals and Tests of Fit for One-Hit-Curves by J. J. Gart 14. Kidney-Survival Analysis of IgA Nephropathy Patients: A Case Study by O. J. W. F. Kardaun 15. Confidence Bands and the Relation with Decision Analysis: Theory by O. J. W. F. Kardaun 16. Sample Size Determination in Clinical Research by J. Bock and H. Toutenburg
956
Volume 9. Computational Statistics Edited by C. R. R a o 1993 xix + 1045 pp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28.
Algorithms by B. Kalyanasundaram Steady State Analysis of Stochastic Systems by K. Kant Parallel Computer Architectures by R. Krishnamurti and B. Narahari Database Systems by S. Lanka and S. Pal Programming Languages and Systems by S. Purushothaman and J. Seaman Algorithms and Complexity for Markov Processes by R. Varadarajan Mathematical Programming: A Computational Perspective by W. W. Hager, R. Horst and P. M. Pardalos Integer Programming by P. M. Pardalos and Y. Li Numerical Aspects of Solving Linear Lease Squares Problems by J. L. Barlow The Total Least Squares Problem by S. Van Huffel and H. Zha Construction of Reliable Maximum-Likelihood-Algorithms with Applications to Logistic and Cox Regression by D. B6hning Nonparametric Function Estimation by T. Gasser, J. Engel and B. Seifert Computation Using the QR Decomposition by C. R. Goodall The EM Algorithm by N. Laird Analysis of Ordered Categorial Data through Appropriate Scaling by C. R. Rao and P. M. Caligiuri Statistical Applications of Artificial Intelligence by W. A. Gale, D. J. Hand and A. E. Kelly Some Aspects of Natural Language Processes by A. K. Joshi Gibbs Sampling by S. F. Arnold Bootstrap Methodology by G. J. Babu and C. R. Rao The Art of Computer Generation of Random Variables by M. T. Boswell, S. D. Gore, G. P. Patil and C. Taillie Jackknife Variance Estimation and Bias Reduction by S. Das Peddada Designing Effective Statistical Graphs by D. A. Burn Graphical Methods for Linear Models by A. S. Hadi Graphics for Time Series Analysis by H. J. Newton Graphics as Visual Language by T. Selker and A. Appel Statistical Graphics and Visualization by E. J. Wegman and D. B. Carr Multivariate Statistical Visualization by F. W. Young, R. A. Faldowski and M. M. McFarlane Graphical Methods for Process Control by T. L. Ziemer
957
Volume 10. Signal Processing and its Applications Edited by N. K. Bose and C. R. R a o 1993 xvii + 992 pp.
1. Signal Processing for Linear Instrumental Systems with Noise: A General Theory with Illustrations for Optical Imaging and Light Scattering Problems by M. Bertero and E. R. Pike 2. Boundary Implication Rights in Parameter Space by N. K. Bose 3. Sampling of Bandlimited Signals: Fundamental Results and Some Extensions by J. L. Brown, Jr. 4. Localization of Sources in a Sector: Algorithms and Statistical Analysis by K. Buckley and X.-L. Xu 5. The Signal Subspace Direction-of-Arrival Algorithm by J. A. Cadzow 6. Digital Differentiators by S. C. Dutta Roy and B. Kumar 7. Orthogonal Decompositions of 2D Random Fields and their Applications for 2D Spectral Estimation by J. M. Francos 8. VLSI in Signal Processing by A. Ghouse 9. Constrained Beamforming and Adaptive Algorithms by L. C. Godara 10. Bispectral Speckle Interferometry to Reconstruct Extended Objects from Turbulence-Degraded Telescope Images by D. M. Goodmanl T. W. Lawrence, E. M. Johansson and J. P. Fitch 11. Multi-Dimensional Signal Processing by K. Hirano and T. Nomura 12. On the Assessment of Visual Communication by F. O. Huck, C. L. Fales, R. Alter-Gartenberg and Z. Rahman 13. VLSI Implementations of Number Theoretic Concepts with Applications in Signal Processing by G. A. Jullien, N. M. Wigley and J. Reilly 14. Decision-level Neural Net Sensor Fusion by R. Y. Levine and T. S. Khuon 15. Statistical Algorithms for Noncausal Gauss Markov Fields by J. M. F. Moura and N. Balram 16. Subspace Methods for Directions-of-Arrival Estimation by A. Paulraj, B. Ottersten, R. Roy, A. Swindlehurst, G. Xu and T. Kailath 17. Closed Form Solution to the Estimates of Directions of Arrival Using Data from an Array of Sensors by C. R. Rao and B. Zhou 18. High-Resolution Direction Finding by S. V. Schell and W. A. Gardner 19. Multiscale Signal Processing Techniques: A Review by A. H. Tewfik, M. Kim and M. Deriche 20. Sampling Theorems and Wavelets by G. G. Walter 21. Image and Video Coding Research by J. W. Woods 22. Fast Algorithms for Structured Matrices in Signal Processing by A. E. Yagle
958
Volume 11. Econometrics Edited by G. S. Maddala, C. R. Rao and H. D. Vinod 1993 xx + 783 pp.
1. Estimation from Endogenously Stratified Samples by S. R. Cosslett 2. Semiparametric and Nonparametric Estimation of Quantal Response Models by J. L. Horowitz 3. The Selection Problem in Econometrics and Statistics by C. F. Manski 4. General Nonparametric Regression Estimation and Testing in Econometrics by A. Ullah and H. D. Vinod 5. Simultaneous Microeconometric Models with Censored or Qualitative Dependent Variables by R. Blundell and R. J. Smith 6. Multivariate Tobit Models in Econometrics by L. -F. Lee 7. Estimation of Limited Dependent Variable Models under Rational Expectations by G. S. Maddala 8. Nonlinear Time Series and Macroeconometrics by W. A. Brock and S. M. Potter 9. Estimation, Inference and Forecasting of Time Series Subject to Changes in Time by J. D. Hamilton 10. Structural Time Series Models by A. C. Harvey and N. Shephard 11. Bayesian Testing and Testing Bayesians by J. -P. Florens and M. Mouchart 12. Pseudo-Likelihood Methods by C. Gourieroux and A. Monfort 13. Rao's Score Test: Recent Asymptotic Results by R. Mukerjee 14. On the Strong Consistency of M-Estimates in Linear Models under a General Discrepancy Function by Z. D. Bai, Z. J. Liu and C. R. Rao 15. Some Aspects of Generalized Method of Moments Estimation by A. Hall 16. Efficient Estimation of Models with Conditional Moment Restrictions by W. K. Newey 17. Generalized Method of Moments: Econometric Applications by M. Ogaki 18. Testing for Heteroskedasticity by A. R. Pagan and Y. Pak 19. Simulation Estimation Methods for Limited Dependent Variable Models by V. A. Hajivassiliou 20. Simulation Estimation for Panel Data Models with Limited Dependent Variable by M. P. Keane 21. A Perspective on Application of Bootstrap methods in Econometrics by J. Jeong and G. S. Maddala 22. Stochastic Simulations for Inference in Nonlinear Errors-in-Variables Models by R. S. Mariano and B. W. Brown 23. Bootstrap Methods: Applications in Econometrics by H. D. Vinod 24. Identifying outliers and Influential Observations in Econometric Models by S. G. Donald and G. S. Maddala 25. Statistical Aspects of Calibration in Macroeconomics by A. W. Gregory and G. W. Smith
Contents' of previous volumes'
959
26. Panel Data Models with Rational Expectations by K. Lahiri 27. Continuous Time Financial Models: Statistical Applications of Stochastic Processes by K. R. Sawyer
Volume 12. Environmental Statistics Edited by G. P. Patil and C. R. R a o 1994 xix + 927 pp.
1. Environmetrics: An Emerging Science by J. S. Hunter 2. A National Center for Statistical Ecology and Environmental Statistics: A Center Without Walls by G. P. Patil 3. Replicate Measurements for Data Quality and Environmental Modeling by W. Liggett 4. Design and Analysis of Composite Sampling Procedures: A Review by G. Lovison, S. D. Gore and G. P. Patil 5. Ranked Set Sampling by G. P. Patil, A. K. Sinha and C. Taillie 6. Environmental Adaptive Sampling by G. A. F. Seber and S. K. Thompson 7. Statistical Analysis of Censored Environmental Data by M. Akritas, T. Ruscitti and G. P. Patil 8. Biological Monitoring: Statistical Issues and Models by E. P. Smith 9. Environmental Sampling and Monitoring by S. V. Stehman and W. Scott Overton 10. Ecological Statistics by B. F. J. Manly 11. Forest Biometrics by H. E. Burkhart and T. G. Gregoire 12. Ecological Diversity and Forest Management by J. H. Gove, G. P. Patil, B. F. Swindel and C. Taillie 13. Ornithological Statistics by P. M. North 14. Statistical Methods in Developmental Toxicology by P. J. Catalano and L. M. Ryan 15. Environmental Biometry: Assessing Impacts of Environmental Stimuli Via Animal and Microbial Laboratory Studies by W. W. Piegorsch 16. Stochasticity in Deterministic Models by J. J. M. Bedaux and S. A. L. M. Kooijman 17. Compartmental Models of Ecological and Environmental Systems by J. H. Matis and T. E. Wehrly 18. Environmental Remote Sensing and Geographic Information Systems-Based Modeling by W. L. Myers 19. Regression Analysis of Spatially Correlated Data: The Kanawha County Health Study by C. A. Donnelly, J. H. Ware and N. M. Laird 20. Methods for Estimating Heterogeneous Spatial Covariance Functions with Environmental Applications by P. Guttorp and P. D. Sampson
960
21. Meta-analysis in Environmental Statistics by V. Hasselblad 22. Statistical Methods in Atmospheric Science by A. R. Solow 23. Statistics with Agricultural Pests and Environmental Impacts by L. J. Young and J. H. Young 24. A Crystal Cube for Coastal and Estuarine Degradation: Selection of Endpoints and Development of Indices for Use in Decision Making by M. T. Boswell, J, S. O'Connor and G. P. Patil 25. How Does Scientific Information in General and Statistical Information in Particular Input to the Environmental Regulatory Process? by C. R. Cothern 26. Environmental Regulatory Statistics by C. B. Davis 27. An Overview of Statistical Issues Related to Environmental Cleanup by R. Gilbert 28. Environmental Risk Estimation and Policy Decisions by H. Lacayo Jr.
Volume 13. Design and Analysis of Experiments Edited by S. G h o s h and C. R. R a o 1996 xviii + 1230 pp.
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.
The Design and Analysis of Clinical Trials by P. Armitage Clinical Trials in Drug Development: Some Statistical Issues by H. I. Patel Optimal Crossover Designs by J. Stufken Design and Analysis of Experiments: Nonparametric Methods with Applications to Clinical Trials by P. K. Sen Adaptive Designs for Parametric Models by S. Zacks Observational Studies and Nonrandomized Experiments by P. R. Rosenbaum Robust Design: Experiments for Improving Quality by D. M. Steinberg Analysis of Location and Dispersion Effects from Factorial Experiments with a Circular Response by C. M. Anderson Computer Experiments by J. R. Koehler and A. B. Owen A Critique of Some Aspects of Experimental Design by J, N. Srivastava Response Surface Designs by N. R. Draper and D. K. J. Lin Multiresponse Surface Methodology by A. I. Khuri Sequential Assembly of Fractions in Factorial Experiments by S. Ghosh Designs for Nonlinear and Generalized Linear Models by A. C. Atkinson and L. M. Haines Spatial Experimental Design by R. J. Martin Design of Spatial Experiments: Model Fitting and Prediction by V. V. Fedorov Design of Experiments with Selection and Ranking Goals by S. S. Gupta and S. Panchapakesan
961
18. Multiple Comparisons by A. C. Tamhane 19. Nonparametric Methods in Design and Analysis of Experiments by E. Brunner and M. L. Puri 20. Nonparametric Analysis of Experiments by A. M. Dean and D. A. Wolfe 21. Block and Other Designs in Agriculture by D. J. Street 22. Block Designs: Their Combinatorial and Statistical Properties by T. Calinski and S. Kageyama 23. Developments in Incomplete Block Designs for Parallel Line Bioassays by S. Gupta and R. Mukerjee 24. Row-Column Designs by K. R. Shah and B. K. Sinha 25. Nested Designs by J. P. Morgan 26. Optimal Design: Exact Theory by C. S. Cheng 27. Optimal and Efficient Treatment - Control Designs by D. Majumdar 28. Model Robust Designs by Y-J. Chang and W. I. Notz 29. Review of Optimal Bayes Designs by A. DasGupta 30. Approximate Designs for Polynomial Regression: Invariance, Admissibility, and Optimality by N. Gaffke and B. Heiligers
Volume 14. Statistical Methods in Finance Edited by G. S. M a d d a l a and C. R. R a o 1996 xvi + 733 pp.
1. Econometric Evaluation of Asset Pricing Models by W. E. Ferson and R. Jegannathan 2. Instrumental Variables Estimation of Conditional Beta Pricing Models by C. R. Harvey and C. M. Kirby 3. Semiparametric Methods for Asset Pricing Models by B. N. Lehmann 4. Modeling the Term Structure by A. R. Pagan, A. D. Hall, and V. Martin 5. Stochastic Volatility by E. Ghysels, A. C. Harvey and E. Renault 6. Stock Price Volatility by S. F. LeRoy 7. GARCH Models of Volatility by F. C. Palm 8. Forecast Evaluation and Combination by F. X. Diebold and J. A. Lopez 9. Predictable Components in Stock Returns by G. Kaul 10. Interset Rate Spreads as Predictors of Business Cycles by K. Lahiri and J. G. Wang 11. Nonlinear Time Series, Complexity Theory, and Finance by W. A. Brock and P. J. F. deLima 12. Count Data Models for Financial Data by A. C. Cameron and P. K. Trivedi 13. Financial Applications of Stable Distributions by J. H. McCulloch 14. Probability Distributions for Financial Models by J. B. McDonald 15. Bootstrap Based Tests in Financial Models by G. S. Maddala and H. Li
962
Contents of previous volumes'
16. Principal Component and Factor Analyses by C. R. Rao 17. Errors in Variables Problems in Finance by G. S. Maddala and M. Nimalendran 18. Financial Applications of Artificial Neural Networks by M. Qi 19. Applications of Limited Dependent Variable Models in Finance by G. S. Maddala 20. Testing Option Pricing Models by D. S. Bates 21. Peso Problems: Their Theoretical and Empirical Implications by M. D. D. Evans 22. Modeling Market Microstructure Time Series by J. Hasbrouck 23. Statistical Methods in Tests of Portfolio Efficiency: A Synthesis by J. Shanken
Volume 15. Robust Inference Edited by G. S. M a d d a l a and C. R. Rao 1997 xviii + 698 pp.
1. Robust Inference in Multivariate Linear Regression Using Difference of Two Convex Functions as the Discrepancy Measure by Z. D. Bai, C. R. Rao and Y. H. Wu 2. Minimum Distance Estimation: The Approach Using Density-Based Distances by A. Basu, I. R. Harris and S. Basu 3. Robust Inference: The Approach Based on Influence Functions by M. Markatou and E. Ronchetti 4. Practical Applications of Bounded-Influence Tests by S. Heritier and M-P. Victoria-Feser 5. Introduction to Positive-Breakdown Methods by P. J. Rousseeuw 6. Outlier Identification and Robust Methods by U. Gather and C. Becker 7. Rank-Based Analysis of Linear Models by T. P. Hettmansperger, J. W. McKean and S. J. Sheather 8. Rank Tests for Linear Models by R. Koenker 9. Some Extensions in the Robust Estimation of Parameters of Exponential and Double Exponential Distributions in the Presence of Multiple Outliers by A. Childs and N. Balakrishnan 10. Outliers, Unit Roots and Robust Estimation of Nonstationary Time Series by G. S. Maddala and Y. Yin 11. Autocorrelation-Robust Inference by P. M. Robinson and C. Velasco 12. A Practitioner's Guide to Robust Covariance Matrix Estimation by W. J. den Haan and A. Levin 13. Approaches to the Robust Estimation of Mixed Models by A. H. Welsh and A. M. Richardson
963
14. Nonparametric Maximum Likelihood Methods by S. R. Cosslett 15. A Guide to Censored Quantile Regressions by B. Fitzenberger 16. What Can Be Learned About Population Parameters When the Data Are Contaminated by J. L. Horowitz and C. F. Manski 17. Asymptotic Representations and Interrelations of Robust Estimators and Their Applications by J. Jure6kovfi and P. K. Sen 18. Small Sample Asymptotics: Applications in Robustness by C. A. Field and M. A. Tingley 19. On the Fundamentals of Data Robustness by G. Maguluri and K. Singh 20. Statistical Analysis With Incomplete Data: A Selective Review by M. G. Akritas and M. P. LaValley 21. On Contamination Level and Sensitivity of Robust Tests by J. A. Visgek 22. Finite Sample Robustness of Tests: An Overview by T. Kariya and P. Kim 23. Future Directions by G. S. Maddala and C. R. Rao
Volume 16. Order Statistics - Theory and Methods Edited by N. Balakrishnan and C. R. Rao 1997 xix + 688 pp.
1. Order Statistics: An Introduction by N. Balakrishnan and C. R. Rao 2. Order Statistics: A Historical Perspective by H. Leon Harter and N. Balakrishnan 3. Computer Simulation of Order Statistics by Pandu R. Tadikamalla and N. Balakrishnan 4. Lorenz Ordering of Order Statistics and Record Values by Barry C. Arnold and Jose A. Villasenor 5. Stochastic Ordering of Order Statistics by Philip J. Boland, Moshe Shaked and J. George Shanthikumar 6. Bounds for Expectations of L-Estimates by Tomasz Rychlik 7. Recurrence Relations and Identities for Moments of Order Statistics by N. Balakrishnan and K. S. Sultan 8. Recent Approaches to Characterizations Based on Order Statistics and Record Values by C. R. Rao and D. N. Shanbhag 9. Characterizations of Distributions via Identically Distributed Functions of Order Statistics by Ursula Gather, Udo Kamps and Nicole Schweitzer 10. Characterizations of Distributions by Recurrence Relations and Identities for Moments of Order Statistics by Udo Kamps 11. Univariate Extreme Value Theory and Applications by Janos Galambos 12. Order Statistics: Asymptotics in Applications by Pranab Kumar Sen 13. Zero-One Laws for Large Order Statistics by R. J. Tomkins and Hong Wang 14. Some Exact Properties Of Cook's D1 by D. R. Jensen and D. E. Ramirez
964
15. Generalized Recurrence Relations for Moments of Order Statistics from Non-Identical Pareto and Truncated Pareto Random Variables with Applications to Robustness by Aaron Childs and N. Balakrishnan 16. A Semiparametric Bootstrap for Simulating Extreme Order Statistics by Robert L. Strawderman and Daniel Zelterman 17. Approximations to Distributions of Sample Quantiles by Chunsheng Ma and John Robinson 18. Concomitants of Order Statistics by H. A. David and H. N. Nagaraja 19. A Record of Records by Valery B. Nevzorov and N. Balakrishnan 20. Weighted Sequential Empirical Type Processes with Applications to ChangePoint Problems by Barbara Szyszkowicz 21. Sequential Quantile and Bahadur-Kiefer Processes by Miklds Cs6rg6 and Barbara Szyszkowicz
Volume 17. Order Statistics: Applications Edited by N. Balakrishnan and C. R. Rao 1998 xviii + 712 pp.
1. Order Statistics in Exponential Distribution by Asit P. Basu and Bahadur Singh 2. Higher Order Moments of Order Statistics from Exponential and Righttruncated Exponential Distributions and Applications to Life-testing Problems by N. Balakrishnan and Shanti S. Gupta 3. Log-gamma Order Statistics and Linear Estimation of Parameters by N. Balakrishnan and P. S. Chan 4. Recurrence Relations for Single and Product Moments of Order Statistics from a Generalized Logistic Distribution with Applications to Inference and Generalizations to Double Truncation by N. Balakrishnan and Rita Aggarwala 5. Order Statistics from the Type III Generalized Logistic Distribution and Applications by N. Balakrishnan and S. K. Lee 6. Estimation of Scale Parameter Based on a Fixed Set of Order Statistics by Sanat K. Sarkar and Wenjin Wang 7. Optimal Linear Inference Using Selected Order Statistics in Location-Scale Models by M. Masoom Ali and Dale Umbach 8. L-Estimation by J. R. M. Hosking 9. On Some L-estimation in Linear Regression Models by Soroush Alimoradi and A. K. Md. Ehsanes Saleh 10. The Role of Order Statistics in Estimating Threshold Parameters by A. Clifford Cohen 11. Parameter Estimation under Multiply Type-II Censoring by Fanhui Kong
965
12. On Some Aspects of Ranked Set Sampling in Parametric Estimation by Nora Ni Chuiv and Bimal K. Sinha 13. Some Uses of Order Statistics in Bayesian Analysis by Seymour Geisser 14. Inverse Sampling Procedures to Test for Homogeneity in a Multinomial Distribution by S. Panchapakesan, Aaron Childs, B. H. Humphrey and N. Balakrishnan 15. Prediction of Order Statistics by Kenneth S. Kaminsky and Paul I. Nelson 16. The Probability Plot: Tests of Fit Based on the Correlation Coefficient by R. A. Lockhart and M. A. Stephens 17. Distribution Assessment by Samuel Shapiro 18. Application of Order Statistics to Sampling Plans for Inspection by Variables by Helmut Schneider and Frances Barbera 19. Linear Combinations of Ordered Symmetric Observations with Applications to Visual Acuity by Marlos Viana 20. Order-Statistic Filtering and Smoothing of Time-Series: Part I by Gonzalo R. Arce, Yeong-Taeg Kim and Kenneth E. Barner 21. Order-Statistic Filtering and Smoothing of Time-Series: Part II by Kenneth E. Barner and Gonzalo R. Arce 22. Order Statistics in Image Processing by Scott T. Acton and Alan C. Bovik 23. Order Statistics Application to CFAR Radar Target Detection by R. Viswanathan
Volume 18. Bioenvironmental and Public Health Statistics Edited by P. K. Sen and C. R. Rao 2000 xxiv + 1105 pp.
1. Bioenvironment and Public Health: Statistical Perspectives by Pranab K. Sen 2. Some Examples of Random Process Environmental Data Analysis by David R. Brillinger 3. Modeling Infectious Diseases - Aids by L. Billard 4. On Some Multiplicity Problems and Multiple Comparison Procedures in Biostatistics by Yosef Hochberg and Peter H. Westfall 5. Analysis of Longitudinal Data by Julio M. Singer and Dalton F. Andrade 6. Regression Models for Survival Data by Richard A. Johnson and John P. Klein 7. Generalised Linear Models for Independent and Dependent Responses by Bahjat F. Qaqish and John S. Preisser 8. Hierarchial and Empirical Bayes Methods for Environmental Risk Assessment by Gauri Datta, Malay Ghosh and Lance A. Waller 9. Non-parametrics in Bioenvironmental and Public Health Statistics by Pranab Kumar Sen
966
10. Estimation and Comparison of Growth and Dose-Response Curves in the Presence of Purposeful Censoring by Paul W. Stewart 11. Spatial Statistical Methods for Environmental Epidemiology by Andrew B. Lawson and Noel Cressie 12. Evaluating Diagnostic Tests in Public Health by Margaret Pepe, Wendy Leisenring and Carolyn Rutter 13. Statistical Issues in Inhalation Toxicology by E. Weller, L. Ryan and D. Dockery 14. Quantitative Potency Estimation to Measure Risk with Bioenvironmental Hazards by A. John Bailer and Walter W. Piegorsch 15. The Analysis of Case-Control Data: Epidemiologic Studies of Familial Aggregation by Nan M. Laird, Garrett M. Fitzmaurice and Ann G. Schwartz 16. Cochran-Mantel-Haenszel Techniques: Applications Involving Epidemiologic Survey Data by Daniel B. Hall, Robert F. Woolson, William R. Clarke and Martha F. Jones 17. Measurement Error Models for Environmental and Occupational Health Applications by Robert H. Lyles and Lawrence L. Kupper 18. Statistical Perspectives in Clinical Epidemiology by Shrikant I. Bangdiwala and Sergio R. Mufioz 19. ANOVA and ANOCOVA for Two-Period Crossover Trial Data: New vs. Standard by Subir Ghosh and Lisa D. Fairchild 20. Statistical Methods for Crossover Designs in Bioenvironmental and Public Health Studies by Gail E. Tudor, Gary G. Koch and Diane Catellier 21. Statistical Models for Human Reproduction by C. M. Suchindran and Helen P. Koo 22. Statistical Methods for Reproductive Risk Assessment by Sati Mazumdar, Yikang Xu, Donald R. Mattison, Nancy B. Sussman and Vincent C. Arena 23. Selection Biases of Samples and their Resolutions by Ranajit Chakraborty and C. Radhakrishna Rao 24. Genomic Sequences and Quasi-Multivariate CATANOVA by Hildete Prisco Pinheiro, Frangoise Seillier-Moiseiwitsch, Pranab Kumar Sen and Joseph Eron Jr 25. Statistical Methods for Multivariate Failure Time Data and Competing Risks by Ralph A. DeMasi 26. Bounds on Joint Survival Probabilities with Positively Dependent Competing Risks by Sanat K. Sarkar and Kalyan Ghosh 27. Modeling Multivariate Failure Time Data by Limin X. Clegg, Jianwen Cai and Pranab K. Sen 28. The Cost-Effectiveness Ratio in the Analysis of Health Care Programs by Joseph C. Gardiner, Cathy J. Bradley and Marianne Huebner 29. Quality-of-Life: Statistical Validation and Analysis An Example from a Clinical Trial by Balakrishna Hosmane, Clement Maurath and Richard Manski 30. Carcinogenic Potency: Statistical Perspectives by Anup Dewanji
967
31. Statistical Applications in Cardiovascular Disease by Elizabeth R. DeLong and David M. DeLong 32. Medical Informatics and Health Care Systems: Biostatistical and Epidemiologic Perspectives by J. Zvfirov/t 33. Methods of Establishing In Vitro-In Vivo Relationships for Modified Release Drug Products by David T. Mauger and Vernon M. Chinchilli 34. Statistics in Psychiatric Research by Sati Mazumdar, Patricia R. Houck and Charles F. Reynolds III 35. Bridging the Biostatistics-Epidemiology Gap by Lloyd J. Edwards 36. Biodiversity - Measurement and Analysis by S. P. Mukherjee

(Book) Handbook of Statistics Vol 19 - Elsevier Science 2001 - Stochastic Processes. Theory and Methods North Holland - Shanbhag - Rao

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Book) Handbook of Statistics Vol 19 - Elsevier Science 2001 - Stochastic Processes. Theory and Methods North Holland - Shanbhag - Rao

Uploaded by

Copyright:

Available Formats

Preface

P ( X > x) ~ cx ~[x ---+ oc] .

{'x(x) = (x/a) -~,

By a simple conditioning argument we find

Y ~ ~(III)(#, ap 1/~, o~)

3. Multivariate Pareto distributions

Now suppose that for some vector _c(p) we have

and f'x(x_) = 1+ max

Pareto (II)(#1(y), o-1~), 0~ 1(y))

and for each x > 0,

Pareto (II)(/~2(x), o-2(x), c~2(x)) .

( 20 +22(y--/-~2) ) Pareto (II) #1,~ ~_ 21~C~-2),c~

20 Jr- 21(X -- /~1) ) ~-l)' c~ .

X IY = y ~ Pareto (III)(//l(y), crl (y), ~1 (Y)) and for each x > O,

f2s(2,fi) = c(_2)[1 J[1.~+ 2233+ .~12#33]-2I(~ " > O)I()~ > O)

O{I(X-- //l)a' lc~2(y-- //2)~2-1/(X> //1)/(Y )" //2)

[ 1+/~2(Y_--__//2)~ ll/al "~ _}_~ul2(Y__ ]12)~'_j 9~I)

~_ I-[(xi - #i) "i

fX(X__) = I-[~-l [O~i(Xi -- fli)~i-llI(x- ) [l)

5. Autoregressive elassiealPareto processes

5.1. The Lawrence-Lewis classical Pareto process

Fig. 1. Simulated sample paths of Lawrence-Lewis classical Pareto processes.

(1 -p)c~2a 2 (~ - 1)2(c~ - 2)(c~ -p)

and consequently the autocorrelation is given by

= P(x21 < =pP(X,-I < ep) + (1 -p)P(e~ > 1) 1 -l+p

5.2. The Gaver-Lewis classical Pareto process

X, = ~pX15pe, with probability p = aPX~5~ with probability 1 - p

Fig. 2. Simulated sample paths of Gaver-Lewis classical Pareto processes.

and consequently the lag 1 autocorrelation is given by

6. Autoregressive Pareto (III) processes

20 40 60 80 100 Time alpha=2,p=.9

~0 4"o 6"o ~0 i~0

~0 4'0 60 8"0 100

20 40 60 80 100 Time alpha=4,p=.9

= [1 -4- (~)~1-1{[ 1 + p ( t ) ~] / [ 1 + (t)~ 1 }n,

FM.(t) = P(M~ < t)

From this it is readily verified that lim P I - - M n

and M = max Xi O<_i<N* (6.13)

6.2. The Arnold Pareto (III) process

= (1 - p)-l/~e,, with probability 1 - p .

P(Xn-1 <e,,/(1 _ p ) l / ~ ) = _ l + l ~ P l o g ( 1 p p"

Fig. 4. Simulated sample paths of Arnold Pareto (III) processes.

Consequently, by conditioning on Un, we find

_ - e(Xo_l < x n ) - 1 - p + P and

P(Xn t > X n ) = - - -ps-- - I - p - l o g ( l - p ) ]

In fact for any k,

7. Extensions and modifications

= Xn-k~n with probability Pk

= X~k_k~, with probability Pk

= min{ckX~_k, ~ } with probability Pk

= min{X~_k, ~ } with probability Pk

7.2. Absolutely continuous modifications

b,-1 z G bo Xn 1 with probability 1 - b~

and for the absolutely continuous YARP(III) process

7.3. Multivariate processes

~pyl-p x~n_ 1 with probability 1 - p

X~ =p-1/~-X~_~ with probability p = min{p-1/~-Xn_l, ~_~} with probability 1 - p

8.1. Semi-Pareto processes

Xn = p-1/~-X, 1 with probability p

1/~-Xn_l,~n} with probability

' I p+ep(t) ~(1 -p)ep(t)

8.2. A Markovian variant of the geometric minimization scheme

and, for i = 2 , 3 , . . . , N + = i if and only if

N 7 = i if and only if Ut = 0 , . . . , Ut-i+2 = O, Ut-i+l = 1 .