Professional Documents
Culture Documents
c
James P. Reilly
1 Random Variables
Examples of random variables (RVs) are
noonday temperature on January 22 in downtown Hamilton
the roll of a dice
Canadas annual gross domestic product for a particular year
the height of a person
the value of a noise voltage appearing across the terminals of a resistor
at a specific time instant.
etc. By random variable, we mean that there is uncertainty involved in its
value. We cannot tell what the actual value of the variable will be before a
measurement is made. The whole idea of probability is to make some sense
of this uncertainty, so that performance of systems (such as communications
systems) which involve uncertainty may be analyzed in a meaningful way.
In probability theory, an experiment may be viewed as the acquisition of
a data point, e.g., corresponding to any one of the random variables above.
This may be represented in mathematical terms as a box with a button
on it, as in Figure 1. Every time the button is pushed, a hypothetical
experiment is performed. The box randomly outputs some sample value
x(n) corresponding to the experimental measurement. Every experiment
(button push) corresponds to a value of the index n. For repeated button
pushes, we have the sequence of measurements (i.e., observations or samples)
x(n), n = 1, . . . , N , where N is the number of samples from the experiment.
1
Figure 1: A depiction of the generation of the random process X .
The process (i.e., the box) which generates the entire set of samples
x(n) is called X . The quantity X is a random variable (RV). The set of
all possible outcomes of X is referred to as the ensemble and is given the
symbol S.
We realize that for a given experiment, some values of X are less likely
than others. For example, the sample value x(n) = +20 deg C for the exam-
ple of the temperature on January 22 in Hamilton, is extremely unlikely. In
the case where we are measuring the height of a person, the value 50 10
is a more likely value than 60 10 .
In order to quantify which values of an RV are more likely, we use what
is referred to a probability density function, whose symbol is f (x). An ex-
ample of a specific probability density function is the well-known Gaussian
distribution which is shown plotted in Figure 2. This distribution is typical
of one which would represent the height of a person.
The probability density function f (x) has several properties, some of
which are summarized below:
2
Z
f (x)dx = 1 (1)
X
Z b
P (a x b) = f (x)dx (2)
a
f (x) 0 x. (3)
Equation (2) means that the probability of a sample x of the random variable
X falling in the interval [a, b] is given by the integral in (2). If a and b
are chosen to be in a region over which f (x) is relatively large, then the
probability of an x being in that interval is relatively large, and viceversa.
Equation (1) means that the probability of a sample x of X taking on some
value is one.
We now interpret the meaning of the probability density function. Let
us consider the example of the height of a person from the class. We push
the button N times (where N is large) to generate a sequence of samples
x(n), n = 1, . . . , N . We then discretize the range of all possible values of
x(n) into bins. For example, in this case, we can choose the width of each
bin to be 1 , so that there is a bin associated with each value 50 0 , 50 1 ,
50 2 etc. We now take each sample x(n) and assign it to its appropriate
bin. This is done by incrementing a counter associated with that bin. After
processing all the samples in this way, the bin counts can be plotted as shown
in Figure 3. This kind of plot is called a histogram, which we denote by
h(x). The relative frequency (vertical height of the histogram) for each bin
gives the number of occurrences when the value of the x(n) corresponds to
the value of the bin. The sum of all the relative frequencies of the bins is
equal to N . We denote xk , k = 1, . . . , K as the x- value (i.e., the persons
height) corresponding to the kth bin, where K is the total number of bins.
In the example of Figure 3, K = 10.
The probability density function is defined in terms of the histogram, as
1/N times the limit as N and K of h(n), where in the latter
case, the width of the bin also goes to zero; i.e.,
1
f (x) = lim h(x). (4)
N N ,K
Thus, as N and K each go to infinity and the width of the bin goes to zero,
we have all the information possible about the random variable, and the
histogram of Figure 3 becomes a continuous smooth function and approaches
a probability density function such as that shown in Figure 2.
3
Figure 3: An example of a histogram.
1.1.1 Mean
The definition of the mean of an RV is quite straightforward. Given a
sequence of samples x(n) of the random variable, the mean mX is given as
N
1 X
mX = lim x(n) (5)
N N
n=1
4
Figure 4: Probability density functions f1 (x) and f2 (x) having different
mean values x1 and x2 , but with the same variance.
Note that in (6) we are summing over histogram bins, not samples as we are
in (5). Eq. (6) can be written as
K
X h(xk )
mX = lim xk
N ,K
k=1
N
Z
= xf (x)dx, (7)
1.1.2 Variance
We now look at what the variance of a random variable is. As shown in
Figure 4b, the variance of a distribution gives us an idea of the spread of the
random variable about its mean. A random variable with a higher variance
has a higher probability density in the region far from its mean than an RV
with lower variance.
One intuitive way of quantifying the spread of an RV is to find the average
absolute value of the variable from its mean, according to the following
5
Figure 4: Probability distribution functions f1 (x) and f2 (x), having the
same mean, but different variances 12 and 22 respectively, where 12 < 22 .
Note that the probability of drawing a sample in the interval [a, b] is much
higher when X is distributed according to f2 (x).
formula:
N
1 X
spread = lim |x(n) mX |. (8)
N N
n=1
N
1 X
2
X = lim (x(n) mX )2 (9)
N N
n=1
The variance, as defined by (9), represents the average squared value of the
distance of the sample from its mean.
We can use the same trick as in (6) to evaluate variance. If we use the
histogram, the variance is given (by analogy to (7)), as
K
X
2
X = lim (xk mX )2 h(xk )
N ,K
k=1
Z
= (x mX )2 f (x)dx (10)
6
Figure 5: An illustration showing how the noise samples v(n) are generated.
Equations (7) and (10) are also called moments of x. Eq. (7) is the first
moment, since it involves x raised to the first power, while (10) is the second
moment, because it involves x to the second power.
We close this section with a further example of the mean and variance
of a random variable. Quantum mechanics tells us that any resistor above
absolute zero has a fluctuating noise voltage across its terminals. This fluc-
tuating noise is the result of the random motion of individual electrons mov-
ing about in the resistor. This noise voltage is extremely small (typically
on the order of microvolts), so it is not normally visible on an oscilloscope.
Nevertheless, it is comparable in magnitude to the desired signal voltage re-
ceived from an antenna from a distant station. As such, it can significantly
contaminate the quality of the reception of a TV or radio signal, and for
that matter, any form of communications system. This noise voltage that
appears across a resistor is called thermal noise, and is the most pertinent
random variable we are concerned about in this course. Because there is no
mechanism in the resistor to a accumulate a constant charge difference be-
tween the resistor terminals, the average value of the thermal noise voltage
is zero.
In this specific example, the random variable we consider is related to the
noise voltage appearing across a resistor. Specifically, consider the circuit
shown in Figure 5. The box shown is a sample-and-hold circuit, which
7
samples the voltage v(t) at a given time instant and holds it. These voltage
values are then samples of a random variable, which we denote as V. Since
the average value or the mean of a time-varying voltage is its DC value, the
mean of the voltage samples v(t) of the random variable V is the DC voltage
shown in Figure 5.
Quantum mechanics also tells us that the power of the thermal noise is
proportional to the resistor temperature. Thus, if we vary the temperature
of the resistor, what effect does this have on the probability density function,
the mean, and the variance of the random variable V? Clearly, even though
the power of the noise process has changed, there is still no mechanism for a
charge difference to appear between the terminals, so the mean is still zero.
However, changing the power definitely has an impact on variance. To see
this, the power P in the random process is the expectation of the square of
the voltage, divided by R, the resistance:
Z
1
P = v 2 (n)f (v)dv. (12)
R
2 Random Processes
The idea of a random process is similar to that of a random variable. The
basic distinction is that, for every button push, instead of just getting a
single number (sample) out of our box, we get a whole waveform which
extends over an infinite time interval. Thus, samples of the random process
are denoted as x(t). The idea is illustrated by Figure 6.
The idea of the random process can be further illustrated as follows. Sup-
pose we have a drawer full of resistors. Each waveform xn (t), n = 1, . . . , N
in Figure 6 corresponds to that of a particular resistor. The ensemble of the
random process is the set of all voltage waveforms over all resistors. The
random process whose samples are xn (t) is denoted by X (t).
8
Figure 6: Example of a random process. Every button push gives us a
different waveform, extending over infinite time.
9
5
2
x
1
0 20 40 60 80 100 120 140 160 180 200
Time
mX = E(x(t1 ))
Z
= x(t1 )f (x)dx (13)
where t1 is arbitrary. Note that this formula gives us exactly the same values
as that given by (5). The variance is given by
2
X = E((x(t1 ) mX )2 )
Z
= (x(t1 ) mX )2 f (x)dx (14)
10
with a time average, over just one sample of the process. So, to determine
the mean and variance of a noise process, all we need do is measure the
noise across a single resistor over the time interval [0, T ], and then evaluate
the average of the noise samples themselves, and the average of the quantity
(x mX )2 to evaluate the mean and variance respectively; i.e.,
T
1X
mX = lim x(t) (15)
T T
0
and
T
1X
2
X = lim (x(t) mX )2 (16)
T T
0
Note that the true values are obtained by taking the limit T as shown
above. However, since it is impossible in practice to evaluate averages over
an infinite interval (because it would take an infinite amount of data), we
can only obtain an estimate of the mean and variance, by eliminating the
limits above and taking T to be a finite value.
Thus, there are two ways of evaluating mean and variance of a random
process. They are given by (13) and (15) respectively for mean, and (14) and
(16) respectively for variance. Eqs. (13) and (14) are useful for analytical
purposes when the probability density function is known. However, often in
practice, all we have available are a series of measurements of the random
process over some time interval [0, T ]. Then all we can do is to determine
an estimate of the mean and variance using (15) and (16) without limits,
for T finite.
Note that in the vast majority of cases, the processes we deal with are in-
deed stationary and ergodic. We assume these properties hold unless stated
otherwise.
11
Figure 8: Scatter plot showing heights and weights of people. Each dot
represents the height and weight of one individual.
that height and weight are correlated. That is, knowing one implies some
knowledge about the other.
Let us now quantify this effect. Let us take measurements h(n) and
w(n), n = 1, . . . , N of height and weight respectively, where N is the to-
tal number of measurements (people). The correlation Ch,w between the
random variables h and w is defined as
Ch,w = E (w(n) mW )(h(n) mH )
Z
= f (w, h)(w mW )(h mH )dhdw (17)
12
Figure 9: The generation of random processes with different autocorrelation
functions.
this course, most of the random variables we consider have zero mean, so
correlation and covariance are the same thing in this case.
We now relate the concept of correlation to signals. Consider the system
of Figure 9, where a random process, generated e.g., from resistor noise, is fed
into a lowpass filter. The input noise process x(t) by nature has a rough
characteristic; i.e., the value of the process at time t has no relationship to
the value of the process at time t + , where 6= 0. These processes are
called white processes, by analogy to white light, which contains all visible
spectral components.
Since white noise can vary arbitrarily from one time instant to the next,
it must have very high frequency components present in its spectrum. The
lowpass filter removes these high-frequency components from x(t). The out-
put y(t) is therefore a process without high frequency components, and as
such is one which can only vary smoothly with time. Samples of x(t) and
y(t) are shown in Figure 9.
We now examine the autocorrelation of the processes x(t) and y(t).
Autocorrelation is similar in concept to correlation, except, instead of two
distinct random variables w and h in (17), we use a random process and
a shifted version of the process. Thus, autocorrelation RX ( ) of a random
process X is defined in accordance with (17) as
where is the shift or lag applied. Since we are dealing with ergodic
processes, expectations can be replaced with time averages, so the autocor-
relation of a process may also be defined as
Z T
1
RX ( ) = lim (x(t) mX )(x(t + ) mX )dt (19)
T T 0
13
Figure 10: Autocorrelation functions for the processes x(t) (top) and y(t)
(bottom), respectively, from Figure 9.
In practice when data is available only over a limited time span, the limit
above is removed and T is replaced with a finite value.
We first examine the autocorrelation of the process x(t) in Figure 9.
Since the process is white, the variables x(t) and x(t + ) have no relation to
one another, i.e., they are uncorrelated, or their correlation is zero, for 6= 0.
On the other hand, when = 0, eq. (18) is equivalent to the variance of
the process. Thus, RX ( ) for the white noise case is a delta-function, shown
plotted in Figure 10.
On the other hand, because the process y(t) is smooth, if we know y(t),
then we know something about y(t + ) for some nonzero ; i.e., they are
correlated. This correlation normally decreases as the lag increases. The
autocorrelation RY ( ) of y(t) would appear as shown in Figure 10.
14
2. Rx ( ) is even:
RX ( ) = RX ( ). (21)
Because of stationarity, the expectation in (18) does not change if we
replace t with t . This leads to the result. Note this property implies
that the spectrum of the autocorrelation function is pure real.
3.
|RX ( )| RX (0). (22)
This property can be proved as follows. Consider the positive quantity
E(x(t) x(t + ))2 . Then
or
RX (0) |RX ( )| (24)
which was to be shown.
15