You are on page 1of 11

Chapter 2:

Random Variables and Probability Distributions


Lesson 6: Mean and Variance of Discrete Random Variables
TIME FRAME: 1 hour
LEARNING OUTCOME(S): At the end of the lesson, the learner should be able to:

describe or illustrate the mean (and variance) of a discrete random variable


compute for the mean (and variance) of a discrete random variable
provide an interpretation of the mean (and variance) of a discrete random variable
PRE-REQUISITE LESSONS: Random Variables, Probability Distribution of Discrete
Random Variables
MATERIALS NEEDED: three coins, paper, pencil
KEY CONCEPTS: Mean (long-run average) of a Random Variable, Variance of a Random
Variable

LESSON OUTLINE:
1. Review of Probability Distributions
2. The Mean of a Random Variable
3. The Variance of a Random Variable
4. Examples of finding and interpreting the mean and variance

DEVELOPMENT OF THE LESSON


(A) Introduction: Review of Probability Distributions

Ask students to toss three coins ten times and record the number of heads on each toss.
Group the students into fives. Tell each student to get the average of the number of heads
obtained for the first five tosses, and the average of all the ten tosses. Then, in groups of three
to five, get the average of the averages that they got. If possible, ask the students to get the
average for the entire class.

Next, ask the students what the possible number of heads was. They should say 0, 1,2, 3.
Next, ask them what the range of the average of the first five tosses was. Ask the highest and
lowest values are in the class. Record the number on the board. Do this also for the average
of the ten tosses, the average of each group, and finally record the average of the class. Ask
them what they notice about the average of the number of heads. Also, ask them what they
notice about the range of values as the number of tosses is increased. They should have
noticed fluctuations in the averages, but the averages approach 1.5, and the range of values
from the averages gets narrower with more data (i.e. more students giving information).
(B) Main Lesson : The Mean and Variance of a Discrete Random Variable

Recall the lesson three for this chapter (on Probability Distributions of Discrete Random
Variables). List down the distribution of the number of heads in tosses of three fair coins (or
three independent tosses of one fair coin). If possible, just ask the students to complete the
first two columns of the table. Then, add another column for the product of the entries of the
first and second columns, (X) P(X). After completing the table, ask them to get the total of
the row. Ask them to fill out the entries. Leave a fourth row which will be filled out later.

X=number of heads P(X) (X)P(X)


0 1/8 0
1 3/8 3/8
2 3/8 6/8
3 1/8 3/8
Total 12/8 = 1.5

Ask them what multiplying X and P(X) resembled. They should have seen that it resembled
getting a weighted average of data, with the probabilities serving as weights. Ask them
about how the sum compares to the value that they got from the activity. They should notice
that the average that they got gets closer and closer to 1.5 as the number of values you
average increases. This is related to the concept of the Mean of a random variable

Now formally define the mean of a discrete random variable.

Definition: Given a discrete random variable X, the mean, denoted by , is a sum of the
products formed from multiplying the possible values of X with their corresponding
probabilities. It is also called the expected value of X, and given a symbol (). More
formally:

= () = ( = )

Help students recall the empirical probabilities tend toward theoretical probabilities, and in
consequence, the mean is also a long-run average. This can be observed from the results of
the activity. As the number of trials of a statistical experiment increases, the empirical
average also gets closer and closer to the value of the theoretical average. Inform students
that this is why we can interpret the mean as a long run average.

Recall that if we have three coins tossed (and if all coins are fair), then we would have eight
possible outcomes, HHH, HHT, HTH, HTT, THH, THT, TTH, TTT, and if we would repeat
tossing these coins 8000 times, we would expect 1000 tosses of each of the outcomes, and
thus the expected frequency of 3 heads would be 1000 tosses, of 2 heads would be 3000
tosses, of 1 head would be 3000 tosses, and no heads would be 1000 tosses. If we average
these, we would have
(1000)(3)+ (3000)(2)+(3000)(1)+(1000)(1)
= 1.5
8000

Inform students that although the mean is called an expected value, this should not be
interpreted as the actual expected result when you do an experiment. In our example for
tossing three coins, the mean is 1.5. You should point out that when you toss three dice, you
cannot get 1.5 heads out of it. This indicates that the mean is not necessarily a possible value
of the random variable. So students cannot simply say that the mean is what they expect to be
the number of heads when they toss three coins. It is to be interpreted as a long-run average.
Mention to students that the mean is the value that we expect the long-run average to
approach; it is not the value of the random variable X that we expect to observe.

Tell students to recall that the average of a given set of data is a measure of central tendency.
Inform students that the expected value, being an average, measures the center of the
distribution of the possible values of X.

Mention to students that the mean of a (discrete) random variable X can be given be a
physical interpretation. Suppose we imagine that the x-axis is an infinite see-saw in each
direction, and at each possible value of X, we place weights equal to the corresponding
probabilities. Then the mean is at the point which will make the see-saw balance. In other
words, it is at the centre of gravity of the system.

It may be helpful to give other examples to help students further gain insights.
Example with biased dice:
Recall the biased six-sided dice with a probability distribution for X, the number of spots on
the upward face when the die is rolled, given as follows:

I 1 2 3 4 5 6
P(X=i) 1 1 1 1+ 1+ 1+
6 6 6 6 6 6

The expected value of the distribution may be calculated as follows:

= () = ( = )

1 1 1 1+ 1+ 1+ 7+3
= +2( )+ 3( )+4( )+5( )+6( )=
6 6 6 6 6 6 2

If = 0, then this reduces to a fair dice, for which, we would have a long run average of 3.5
for the number of spots on the upward face.

Provide another example that is used in the real world, such as the following:
Practical Example (in Insurance)
An insurance company sells life insurance of PHP 500,000 for a premium of PHP 10,400 per
year. Actuarial tables show that the probability of normal death in the year following the
purchase of this policy is 0.1%. What is the expected gain for this life insurance policy?

Inform students that there are two simple events here: either the customer will live this year
or will die (a normal death). The probability of a normal death, as given by the problem, is
0.001, which will yield a negative gain to the insurance company in the amount of
-489,600= 10,400-500,000. The probability that the customer will live is 1-0.001=0.999 .
Thus, the insurance companys expected gain X from this insurance policy in the year after
the purchase has the following probability distribution:

Gains Outcome Probability


10,400 Live 0.999
-489,600 Normal Death 0.001

= (10,400)(0.999) + (-489,600)(0.001) =9,900

Students should notice that if the insurance company were to sell a very large number N of
the PHP 500,000 insurance policies to many people, since the (long run) average profit per
insurance policy is PHP 9,900, the company would be expected to make a total profit of N
times PHP 9,900.

Next, ask students whether a measure of central tendency is the only relevant summary
measure. Students should remember that for a set of data, we also need other summary
measures, such as measures of variability. Students have already met the concepts of
variance and standard deviation when summarizing data. Assist them to remember the
variance and standard deviation of a set of data are measures of spread. Tell students that
random variables also have a variance (and a standard deviation). The variance is derived by
getting the expected value of (X-)2 where is the mean.

To illustrate, go back to the Table pertaining to flipping three coins and getting the number X
of heads in these three coins. Now add two columns with the following heading in bold, and
fill the corresponding values.

X=number of heads P(X) (X)P(X) (X-)2 (X-)2P(X)


0 1/8 0 (0-1.5)2=2.25 0.28125
1 3/8 3/8 (1-1.5)2=0.25 0.09375
2 3/8 6/8 (2-1.5)2=0.25 0.09375
3 1/8 3/8 (3-1.5)2=2.25 0.28125
Total 12/8 = 1.5 0.75

The total in the last column is called the variance of the random variable, and the square root,
0.866, is the standard deviation.
Now define the variance of a random variable as the weighted average of squared
deviations of the values of X from the mean, where the weights are the respective
probabilities. The variance, usually denoted by the symbol , is also denoted as ()
and formally defined as

= () = ( )

= ( )2 ( = )

The variance gives a measure of how far the values of X are from the mean. Ask students
what is the variance when X is a constant with probability one. Students should be able to say
that it is zero in this case (just as it was in the case of a set of data that does not vary). Inform
students that in nontrivial cases (i.e. when there is more than one possible distinct value of
X), the variance will be a positive value. The bigger the value of the variance, the farther the
values of X get from the mean.

Define the standard deviation as the square root of the variance of X. That is,

= ()

Example (of Gains in Life Insurance)


Show the following calculations for deviations formed from subtracting the mean from the
gains, as well as squared deviations, and weighted squared deviations.

Gains Probability Deviations Squared Weighted


Deviations Squared
Deviations
10,400 0.999 10388.73 107925794 107817868
-489,600 0.001 -490.466 240556.92 240.55692

The variance is the sum of the entries on the last column, i.e.,

2=107817868+240.55692 =107818108

while the standard deviation is the square root of the variance

= 10383.55

Remind students that, the standard deviation is the more understandable of the two measures
of spread, since the standard deviation is in the same units as X. For example, if X is a
random variable representing the number of heads in three tosses of a fair coin, then the units
for standard deviation is heads, while the variance is in square heads (heads2).

Unlike the mean, there is no simple interpretation for the variance or standard deviation. The
variance though is analogous to the moment of inertia in physics, but that is not necessarily
widely understood by students. Stress to students that, in relative terms,

a small standard deviation (and variance) means that the distribution of the random
variable is quite concentrated around the mean
a large standard deviation (and variance) means that the distribution is rather spread
out, with some chance of observing values at some distance from the mean

Inform students that, in practice, the variance is not computed with the definition, but rather
using the following result:

= () =
Thus, the variance is the difference between the expected value of X2 and the square of the
mean.

Explanatory Note: This can be derived from the definition, some algebraic expansion of a
binomial expression, and some properties of expected values (such as the mean of a constant
is the constant):
= () = ( )
= ( + ) = ( ) ( ) + ( )
= ( ) + =
It is suggested though that this derivation not be discussed in class. It may be helpful though
to use this computational formula, and to use computers whenever possible.

(C) Enrichment : Mean of a Continuous Random Variable


(May be skipped, but preferably discussed even cursorily)

Ask students what they think would be the expected value of X if X were continuous with
probability density function f(x). They should provide the equivalent quantity for a
continuous random variable involving an integral. That is , the mean () of a continuous
random variable X with probability density function f(x) is


() = ()

Tell students that integrals may be viewed as sums if the curve is discretized. Also tell
them that when looking at a probability density function, they can locate the mean by
determining the center of gravity of the curve, i.e. where the pivot should be placed to
make the probability density function balance on the x-axis, imagining that the probability
density function is a thin plate of uniform material, with height f(x) at x.

An important consequence of this is that the mean of any random variable (continuous or
discrete) that has a symmetric distribution is always on the axis of symmetry of the
distribution; for a continuous random variable, this means the axis of symmetry of the
probability density function.

KEY POINTS

The mean (or expected value) of a discrete random variable, say X, is a weighted
average of the possible values of the random variable, where the weights are the
respective probabilities

= () = ( = )

The variance is the expected value of the squared deviations from the mean.

= () = ( )

= ( )2 ( = )

The standard deviation is the square root of the variance

= ()
REFERENCES

De Veau, R. D., Velleman, P. F., and Bock, D. E. (2006). Intro Stats. Pearson Ed. Inc.
Workbooks in Statistics 1: 11th Edition, Institute of Statistics, UP Los Banos, College Laguna
4031

http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic4c.html#conten
t_3

http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic4c.html#conten
t_5

http://www.amsi.org.au/ESA_Senior_Years/SeniorTopic4/4_md/SeniorTopic4c.html#conten
t_6

https://www.opened.com/video/mean-and-expected-value-of-discrete-random-
variables/116285

https://www.opened.com/video/variance-and-standard-deviation-of-discrete-random-
variables/116286

https://www.opened.com/video/mean-e-x-and-variance-var-x-for-continuous-random-
variables/116287
ASSESSMENT

1. The probability distributions of four random variables are shown below

P(X=x)
0.25

0.2

0.15

P(X=x)
0.1

0.05

0
1 2 3 4 5 6 7 8 9 10

P(Y=y)
0.6

0.5

0.4

0.3
P(Y=y)
0.2

0.1

0
1 2 3 4 5 6 7 8 9 10
P(W=w)
0.6

0.5

0.4

0.3
P(W=w)
0.2

0.1

0
1 2 3 4 5 6 7 8 9 10

P(T=t)
0.45
0.4
0.35
0.3
0.25
0.2 P(T=t)
0.15
0.1
0.05
0
1 2 3 4 5 6 7 8 9 10

For each of these probability distributions:


a. Confirm that the graph represents a probability distribution.
b. Give a guess of the mean.
c. Compute the actual value of the mean.
d. Provide a guess on which has the smallest variance among the distributions, and the
largest variance
e. Calculate the variance and standard deviation
Soln:
a. In each of the distribution, the probabilities are all clearly non-negative and the sum
of the probabilities equals one.
b. Visually, we can give good guesses to the means : E(X) approx 6; E(W) should be
5.5; EW should be 6; ET should be 5.
c. Using a spreadsheet, we can compute the means as EX=5.7, EW=5.5; EW=6; ET=5

d. Graphs show most variability for Y and least for W


e. We could calculate variance with the computing formula
= () =
So that we only need to obtain the expected value of the squared random variable, and
then subtract from this the square of the mean, and thus verify that Y has the biggest
variance and standard deviation (while W has the least).

You might also like