You are on page 1of 15

Random Processes

c
James P. Reilly

February 11, 2003

We begin the study of random processes by quickly reviewing random


variables:

1 Random Variables
Examples of random variables (RVs) are
noonday temperature on January 22 in downtown Hamilton
the roll of a dice
Canadas annual gross domestic product for a particular year
the height of a person
the value of a noise voltage appearing across the terminals of a resistor
at a specific time instant.
etc. By random variable, we mean that there is uncertainty involved in its
value. We cannot tell what the actual value of the variable will be before a
measurement is made. The whole idea of probability is to make some sense
of this uncertainty, so that performance of systems (such as communications
systems) which involve uncertainty may be analyzed in a meaningful way.
In probability theory, an experiment may be viewed as the acquisition of
a data point, e.g., corresponding to any one of the random variables above.
This may be represented in mathematical terms as a box with a button
on it, as in Figure 1. Every time the button is pushed, a hypothetical
experiment is performed. The box randomly outputs some sample value
x(n) corresponding to the experimental measurement. Every experiment
(button push) corresponds to a value of the index n. For repeated button
pushes, we have the sequence of measurements (i.e., observations or samples)
x(n), n = 1, . . . , N , where N is the number of samples from the experiment.

1
Figure 1: A depiction of the generation of the random process X .

Figure 2: An example of a Gaussian probability density function correspond-


ing to a typical distribution of heights in a class.

The process (i.e., the box) which generates the entire set of samples
x(n) is called X . The quantity X is a random variable (RV). The set of
all possible outcomes of X is referred to as the ensemble and is given the
symbol S.
We realize that for a given experiment, some values of X are less likely
than others. For example, the sample value x(n) = +20 deg C for the exam-
ple of the temperature on January 22 in Hamilton, is extremely unlikely. In
the case where we are measuring the height of a person, the value 50 10
is a more likely value than 60 10 .
In order to quantify which values of an RV are more likely, we use what
is referred to a probability density function, whose symbol is f (x). An ex-
ample of a specific probability density function is the well-known Gaussian
distribution which is shown plotted in Figure 2. This distribution is typical
of one which would represent the height of a person.
The probability density function f (x) has several properties, some of
which are summarized below:

2
Z
f (x)dx = 1 (1)
X
Z b
P (a x b) = f (x)dx (2)
a

f (x) 0 x. (3)
Equation (2) means that the probability of a sample x of the random variable
X falling in the interval [a, b] is given by the integral in (2). If a and b
are chosen to be in a region over which f (x) is relatively large, then the
probability of an x being in that interval is relatively large, and viceversa.
Equation (1) means that the probability of a sample x of X taking on some
value is one.
We now interpret the meaning of the probability density function. Let
us consider the example of the height of a person from the class. We push
the button N times (where N is large) to generate a sequence of samples
x(n), n = 1, . . . , N . We then discretize the range of all possible values of
x(n) into bins. For example, in this case, we can choose the width of each
bin to be 1 , so that there is a bin associated with each value 50 0 , 50 1 ,
50 2 etc. We now take each sample x(n) and assign it to its appropriate
bin. This is done by incrementing a counter associated with that bin. After
processing all the samples in this way, the bin counts can be plotted as shown
in Figure 3. This kind of plot is called a histogram, which we denote by
h(x). The relative frequency (vertical height of the histogram) for each bin
gives the number of occurrences when the value of the x(n) corresponds to
the value of the bin. The sum of all the relative frequencies of the bins is
equal to N . We denote xk , k = 1, . . . , K as the x- value (i.e., the persons
height) corresponding to the kth bin, where K is the total number of bins.
In the example of Figure 3, K = 10.
The probability density function is defined in terms of the histogram, as
1/N times the limit as N and K of h(n), where in the latter
case, the width of the bin also goes to zero; i.e.,
1
f (x) = lim h(x). (4)
N N ,K

Thus, as N and K each go to infinity and the width of the bin goes to zero,
we have all the information possible about the random variable, and the
histogram of Figure 3 becomes a continuous smooth function and approaches
a probability density function such as that shown in Figure 2.

3
Figure 3: An example of a histogram.

1.1 The Mean and Variance of a Random Variable


The mean and variance tell us a great deal about an RV. Even though we
may not be able to predict the value of a random variable in a particular
measurement or experiment, in many cases, and especially in the case of
noise, we can analyze its impact on a communications system by knowing
only quantities such as mean and variance. In a practical system, it is fairly
easy to determine the mean and variance of a noise process. Thus, with
knowledge only of mean and variance, even though the actual value of the
noise is not known, meaningful results on how noise impacts performance
can be still be obtained.

1.1.1 Mean
The definition of the mean of an RV is quite straightforward. Given a
sequence of samples x(n) of the random variable, the mean mX is given as
N
1 X
mX = lim x(n) (5)
N N
n=1

However, there is another way of calculating the mean. Bearing in mind


that all samples in a particular bin have the same value xk , and that there
are h(xk ) samples in that bin, each with value xk , then (5) can be written
as
K
1 X
mX = lim xk h(xk ) (6)
N ,K N
k=1

4
Figure 4: Probability density functions f1 (x) and f2 (x) having different
mean values x1 and x2 , but with the same variance.

Note that in (6) we are summing over histogram bins, not samples as we are
in (5). Eq. (6) can be written as
K
X h(xk )
mX = lim xk
N ,K
k=1
N
Z
= xf (x)dx, (7)

where the limit has been evaluated using (4).


The operation in (7) is also called the expectation of a random variable
x. The expectation operator is denoted as E(x). Thus, the mean and
the expectation of the random variable X are the same thing. The effect
of changing the mean of a random variable is to translate its probability
density function along the x-axis, as shown in Figure 4a. The mean of the
probability density function gives us an idea where the mass of the pdf is
concentrated. Note that the formula (6) is an easier way to calculate the
mean than (5). With (5), we must draw an infinite number of samples of
the random variable, and then form the average. This is impossible to do in
practice. However, (6) is much easier to evaluate, if the probability density
function is available.

1.1.2 Variance
We now look at what the variance of a random variable is. As shown in
Figure 4b, the variance of a distribution gives us an idea of the spread of the
random variable about its mean. A random variable with a higher variance
has a higher probability density in the region far from its mean than an RV
with lower variance.
One intuitive way of quantifying the spread of an RV is to find the average
absolute value of the variable from its mean, according to the following

5
Figure 4: Probability distribution functions f1 (x) and f2 (x), having the
same mean, but different variances 12 and 22 respectively, where 12 < 22 .
Note that the probability of drawing a sample in the interval [a, b] is much
higher when X is distributed according to f2 (x).

formula:
N
1 X
spread = lim |x(n) mX |. (8)
N N
n=1

However, this quantity is difficult to deal with mathematically. For instance,


it is not differentiable at x = mX . So, instead we define the measure called
variance, denoted as X 2 , in the following way:

N
1 X
2
X = lim (x(n) mX )2 (9)
N N
n=1

The variance, as defined by (9), represents the average squared value of the
distance of the sample from its mean.
We can use the same trick as in (6) to evaluate variance. If we use the
histogram, the variance is given (by analogy to (7)), as
K
X
2
X = lim (xk mX )2 h(xk )
N ,K
k=1
Z
= (x mX )2 f (x)dx (10)

This quantity, which is variance, is also the expectation of the quantity


(x mX )2 , denoted as E(x mX )2 .

6
Figure 5: An illustration showing how the noise samples v(n) are generated.

As an aside, the expectation of any function g(x) of the random variable


x can be evaluated by
Z
E(g(x)) = g(x)f (x)dx (11)

Equations (7) and (10) are also called moments of x. Eq. (7) is the first
moment, since it involves x raised to the first power, while (10) is the second
moment, because it involves x to the second power.
We close this section with a further example of the mean and variance
of a random variable. Quantum mechanics tells us that any resistor above
absolute zero has a fluctuating noise voltage across its terminals. This fluc-
tuating noise is the result of the random motion of individual electrons mov-
ing about in the resistor. This noise voltage is extremely small (typically
on the order of microvolts), so it is not normally visible on an oscilloscope.
Nevertheless, it is comparable in magnitude to the desired signal voltage re-
ceived from an antenna from a distant station. As such, it can significantly
contaminate the quality of the reception of a TV or radio signal, and for
that matter, any form of communications system. This noise voltage that
appears across a resistor is called thermal noise, and is the most pertinent
random variable we are concerned about in this course. Because there is no
mechanism in the resistor to a accumulate a constant charge difference be-
tween the resistor terminals, the average value of the thermal noise voltage
is zero.
In this specific example, the random variable we consider is related to the
noise voltage appearing across a resistor. Specifically, consider the circuit
shown in Figure 5. The box shown is a sample-and-hold circuit, which

7
samples the voltage v(t) at a given time instant and holds it. These voltage
values are then samples of a random variable, which we denote as V. Since
the average value or the mean of a time-varying voltage is its DC value, the
mean of the voltage samples v(t) of the random variable V is the DC voltage
shown in Figure 5.
Quantum mechanics also tells us that the power of the thermal noise is
proportional to the resistor temperature. Thus, if we vary the temperature
of the resistor, what effect does this have on the probability density function,
the mean, and the variance of the random variable V? Clearly, even though
the power of the noise process has changed, there is still no mechanism for a
charge difference to appear between the terminals, so the mean is still zero.
However, changing the power definitely has an impact on variance. To see
this, the power P in the random process is the expectation of the square of
the voltage, divided by R, the resistance:
Z
1
P = v 2 (n)f (v)dv. (12)
R

As we will see later, the resistance can be normalized to R = 1. Taking the


mean mV to be zero, and comparing (12) to (10), we see that the power of
the noise process is the same as its variance. Thus, increasing the power
of the noise voltage is equivalent to increasing the variance, which increases
the spread of the distribution, as shown in Figure 4b. A distribution with a
larger variance has a higher probability that an observation will be in some
interval [a, b] (where a, b represent relatively large values, as shown in Figure
4b), than a distribution with a lower variance. Thus, a process with larger
variance (power) is more likely to have a larger value, which fits with our
understanding of what power is.

2 Random Processes
The idea of a random process is similar to that of a random variable. The
basic distinction is that, for every button push, instead of just getting a
single number (sample) out of our box, we get a whole waveform which
extends over an infinite time interval. Thus, samples of the random process
are denoted as x(t). The idea is illustrated by Figure 6.
The idea of the random process can be further illustrated as follows. Sup-
pose we have a drawer full of resistors. Each waveform xn (t), n = 1, . . . , N
in Figure 6 corresponds to that of a particular resistor. The ensemble of the
random process is the set of all voltage waveforms over all resistors. The
random process whose samples are xn (t) is denoted by X (t).

8
Figure 6: Example of a random process. Every button push gives us a
different waveform, extending over infinite time.

An issue with a random process is,Since the samples of the random


process are functions of time and not single numbers, how do we define the
probability density function of the process, and the mean, variance, etc.?
(Recall that pdfs, means and variances apply only to random variables).
This issue is resolved in the following way. Suppose we are able to freeze time
at some value t = t1 as in Figure 6. Then the samples xn (t1 )) over the index
n (button pushes) are single numbers and thus define a random variable.
There must therefore be a probability density function, and therefore also
a mean and variance, associated with this random variable. In this sense,
we say the random process is distributed with a distribution f (x) when the
values of the samples xn (t1 ) at some time t = t1 have a pdf f (x). The
ensemble of the process is the set of all possible outcomes over all button
pushes. Thus, the mean and variance are ensemble averages the averaging
inherent in these quantities is taken over the ensemble of the process.
Stationarity: A stationary random process is one whose probability
density function is not dependent on the time. An example of a sample
from a nonstationary process as shown in Figure 7. Note that the mean and
variance of this waveform are both dependent on time. It turns out that
the if the probability density function describing the process is independent
of time, then the mean and variance are also independent of time. (This
is because the mean and variance are completely determined by the pdf).
However, the converse is not necessarily true. Because of this, we define a
weaker form of stationarity called wide-sense stationarity, where only the
mean and variance need be independent of time, rather than the whole

9
5

2
x

1
0 20 40 60 80 100 120 140 160 180 200
Time

Figure 7: Example of a NONstationary random process. Note that both the


mean and the variance of the process change with time.

probability density function being independent of time.


For a stationary process, the mean is evaluated by

mX = E(x(t1 ))
Z
= x(t1 )f (x)dx (13)

where t1 is arbitrary. Note that this formula gives us exactly the same values
as that given by (5). The variance is given by
2
X = E((x(t1 ) mX )2 )
Z
= (x(t1 ) mX )2 f (x)dx (14)

Ergodicity: In practice, it is sometimes difficult to evaluate the en-


semble average of a random process to evaluate its mean or variance. For
example, if we are trying to estimate the mean or variance of the noise volt-
age across a 1K resistor, then we must take samples of the noise voltage
across a very large number of resistors at a specific time instant, and then
evaluate the appropriate averages. This could be a very tedious process.
However, the concept of ergodicity helps us in this task. An ergodic random
process is one where the ensemble averages in (13) or (14) can be replaced

10
with a time average, over just one sample of the process. So, to determine
the mean and variance of a noise process, all we need do is measure the
noise across a single resistor over the time interval [0, T ], and then evaluate
the average of the noise samples themselves, and the average of the quantity
(x mX )2 to evaluate the mean and variance respectively; i.e.,
T
1X
mX = lim x(t) (15)
T T
0

and
T
1X
2
X = lim (x(t) mX )2 (16)
T T
0
Note that the true values are obtained by taking the limit T as shown
above. However, since it is impossible in practice to evaluate averages over
an infinite interval (because it would take an infinite amount of data), we
can only obtain an estimate of the mean and variance, by eliminating the
limits above and taking T to be a finite value.
Thus, there are two ways of evaluating mean and variance of a random
process. They are given by (13) and (15) respectively for mean, and (14) and
(16) respectively for variance. Eqs. (13) and (14) are useful for analytical
purposes when the probability density function is known. However, often in
practice, all we have available are a series of measurements of the random
process over some time interval [0, T ]. Then all we can do is to determine
an estimate of the mean and variance using (15) and (16) without limits,
for T finite.
Note that in the vast majority of cases, the processes we deal with are in-
deed stationary and ergodic. We assume these properties hold unless stated
otherwise.

2.1 Correlation and Autocorrelation


These are very important concepts in the study of random processes. We
begin with the idea of correlation. Let us measure the height and weights of
people in our class. Note that height and weight are both random variables.
We expect that if a person is tall, then they will weigh more, and vice versa.
To illustrate this more clearly, consider the plot shown in Figure 8. Here,
each dot represents the height and weight of a particular individual. It is
seen that the region of concentration the dots is elliptical in shape and is
aligned along a diagonal line. This means if we are given a persons height,
on average we can say something about their weight. In this case, we say

11
Figure 8: Scatter plot showing heights and weights of people. Each dot
represents the height and weight of one individual.

that height and weight are correlated. That is, knowing one implies some
knowledge about the other.
Let us now quantify this effect. Let us take measurements h(n) and
w(n), n = 1, . . . , N of height and weight respectively, where N is the to-
tal number of measurements (people). The correlation Ch,w between the
random variables h and w is defined as

Ch,w = E (w(n) mW )(h(n) mH )
Z
= f (w, h)(w mW )(h mH )dhdw (17)

where f (w, h) is the joint probability density function of h and w, and mH


and mW are the means of h and w respectively. Notice that in the above,
to evaluate the expectation, we use the same idea as in forming (7).
This equation can be justified as follows. If a particular person is tall,
then (h(n) mH ) in (17) is positive. Then it is likely that their weight is
larger, indicating that the quantity (w(n) mW ) is also positive. Thus, the
product (w mW )(h mH ) is positive. If on the other hand the person is
short, then (h(n)mH ) is negative and (w(n)mW ) is likely to be negative.
Thus, (wmW )(hmH ) is also positive most of the time. Only occasionally
will this product be negative. Thus, when this product is averaged over all
possible outcomes, the result is a positive finite number, whose value gives
us a feeling about how closely related are the two random variables.
Sometimes one variable has a negative effect on the other. That is, when
one variable increases, the other decreases. In this case, the correlation of
the two variables is a negative finite number.
There is also the closely-related quantity called covariance. Here, the
definition is the same as in (17), except the means are not subtracted. In

12
Figure 9: The generation of random processes with different autocorrelation
functions.

this course, most of the random variables we consider have zero mean, so
correlation and covariance are the same thing in this case.
We now relate the concept of correlation to signals. Consider the system
of Figure 9, where a random process, generated e.g., from resistor noise, is fed
into a lowpass filter. The input noise process x(t) by nature has a rough
characteristic; i.e., the value of the process at time t has no relationship to
the value of the process at time t + , where 6= 0. These processes are
called white processes, by analogy to white light, which contains all visible
spectral components.
Since white noise can vary arbitrarily from one time instant to the next,
it must have very high frequency components present in its spectrum. The
lowpass filter removes these high-frequency components from x(t). The out-
put y(t) is therefore a process without high frequency components, and as
such is one which can only vary smoothly with time. Samples of x(t) and
y(t) are shown in Figure 9.
We now examine the autocorrelation of the processes x(t) and y(t).
Autocorrelation is similar in concept to correlation, except, instead of two
distinct random variables w and h in (17), we use a random process and
a shifted version of the process. Thus, autocorrelation RX ( ) of a random
process X is defined in accordance with (17) as

RX ( ) = E((x(t) mX )(x(t + ) mX )) (18)

where is the shift or lag applied. Since we are dealing with ergodic
processes, expectations can be replaced with time averages, so the autocor-
relation of a process may also be defined as
Z T
1
RX ( ) = lim (x(t) mX )(x(t + ) mX )dt (19)
T T 0

13
Figure 10: Autocorrelation functions for the processes x(t) (top) and y(t)
(bottom), respectively, from Figure 9.

In practice when data is available only over a limited time span, the limit
above is removed and T is replaced with a finite value.
We first examine the autocorrelation of the process x(t) in Figure 9.
Since the process is white, the variables x(t) and x(t + ) have no relation to
one another, i.e., they are uncorrelated, or their correlation is zero, for 6= 0.
On the other hand, when = 0, eq. (18) is equivalent to the variance of
the process. Thus, RX ( ) for the white noise case is a delta-function, shown
plotted in Figure 10.
On the other hand, because the process y(t) is smooth, if we know y(t),
then we know something about y(t + ) for some nonzero ; i.e., they are
correlated. This correlation normally decreases as the lag increases. The
autocorrelation RY ( ) of y(t) would appear as shown in Figure 10.

2.2 Properties of the Autocorrelation Function


1.
2
RX (0) = X . (20)
This property follows by substituting = 0 in (18) and comparing to
the definition of variance. Note that the variance is the power of the
process when the resistance is 1.

14
2. Rx ( ) is even:
RX ( ) = RX ( ). (21)
Because of stationarity, the expectation in (18) does not change if we
replace t with t . This leads to the result. Note this property implies
that the spectrum of the autocorrelation function is pure real.

3.
|RX ( )| RX (0). (22)
This property can be proved as follows. Consider the positive quantity
E(x(t) x(t + ))2 . Then

E(x(t) x(t + ))2 0


E(x(t))2 2E(x(t)x(t + )) + E(x(t + )) 0

Replacing these terms with their definitions, and taking advantage of


the stationarity property, we have

RX (0) 2RX ( ) + RX (0) 0. (23)

or
RX (0) |RX ( )| (24)
which was to be shown.

4. RX ( ) is periodic if the process X is periodic.

15

You might also like