Professional Documents
Culture Documents
4
PHIL.015
March 28, 2016
RANDOM VARIABLES AND THEIR PROBABILITY
DISTRIBUTIONS
1. Passage from Probabilistic to Statistical Reasoning
In this handout I introduce the basic concepts of random variable and associated probability distribution functions. Roughly speaking, a random variable is
a numerical-valued quantity whose value depends on the outcome of a probabilistic experiment. Its associated probility distribution specifies the probability
that the variable will assume a given numerical value.
In applications of probability theory to probabilistic (random) experiments the
first step consists of specifying the experiments correct probability model , A, P
with a suitable sample space , event algebra A and a probability measure P
thereon. The next step involves the calculation of probability values of events
that are of interest. In complex (composite) experiments several probability
models may become necessary. When an experiment is performed, statistically oriented experimenters are often less interested in the specific outcome
(or event) of it than in some aspect of the outcome that may be shared with
other particular outcomes. For example, a statistician may be curious about
the number of babies born in a certain hospital each day (which is of course not
a fixed quantity, because it depends on many random factors that vary from one
day to another) and not at all in the particular babies themselves. In statistical
usage, such a quantity is called a random variable, doubtless because its value
tends to vary from one outcomne to the next (hence the term variable) and
the outcome itself depends on chance (thus, the term random). The mathematical concept that captures these quantities is that of a real-valued function
defined on the sample space .
Specifically, given a probability model , A, P, a random variable is any function of the form X : R that assigns to each outcome in a unique
numerical value X() in the set R of real numbers and satisfies the measurability
ondition
{ | X() a } A
for all real numbers a in R. This last condition is thrown in for theoretical
reasons, because strictly speaking, not all real-valued functions are considered
1
X(1 )
X(n )
Values of X
X(2 )
Example 1:
Consider an experiment in which three fair coins are tossed once. We
already know that the probability model 8 , A, P for this experiment is
specified by the 8-element sample space
!
"
8 = 2 2 2 = T T T, T T H, T HT, HT T, HHT, HT H, T HH, HHH ,
event algebra A consisting of all subsets of 8 , and the Laplacean probability measure P, defined by
P(A) =
#{A}
#{8 }
X(T T T ) = 0
X(T T H) = X(T HT ) = X(HT T ) = 1
X(T HH) = X(HT H) = X(HHT ) = 2
X(HHH) = 3
Example 2:
Suppose a probabilistic experiment consists of rolling two fair dice once. Its
accociated probability model 36 , A, P is given by a 36-element sample
space
!
"
36 = 11, 12, , 16, 21, 22, , 26, , , 61, 62, , 66 ,
the usual event algebra A of all subsets of 36 , and the Laplacean probability measure, defined by the fraction
P(A) =
#{A}
#{36 }
for any event A in A. Note that each event A gives rise to a unique 2valued random variable IA : 36 R, given by IA () = 1 if is in
A, and IA () = 0 otherwise. Here IA is called the indicator random
variable of A. Of course, I = 0 at each sample point in 36 . Likewise,
it can be seen that I36 = 1 for all sample points. Indicator random
variables are encoding information about outcomes into numbers. We have
seen that sentences in deductive logic and events in event algebras are
qualitative entities. So far only probabilities were quantitative (numerical).
The introduction of random variables turns probabilistic reasoning into a
variant of quantitative reasoning.
Returning to the topic of random variables, suppose we are interested only
in the sum of outcomes in rolling two dice. For this we need to define a
random variable Y : 36 R by setting Y (ij) =df i + j for all outcomes
or (sample) pairs ij in 36 = 6 6 . Clearly,
Y (11) = 2
Y (12) = Y (21) = 3
Y (13) = Y (22) = Y (31) = 4
Y (23) = Y (32) = Y (41) = Y (14) = 5
Y (33) = Y (42) = Y (24) = Y (15) = Y (51) = 6
Y (43) = Y (34) = Y (25) = Y (52) = Y (61) = Y (16) = 7
Y (44) = Y (53) = Y (35) = Y (26) = Y (62) = 8
Y (45) = Y (54) = Y (63) = Y (36) = 9
Y (64) = Y (46) = Y (55) = 10
Y (56) = Y (65) = 11
Y (66) = 12
Example 3:
Suppose a probability experiment consists, once again, of rolling two fair
dice once. We know that its associated probability model 36 , A, P is
given by a 36-element sample space
!
"
36 = 11, 12, , 16, 21, 22, , 26, , , 61, 62, , 66
with the event algebra A of all subsets of 36 and the Laplacean probability
measure
P(A) =
#{A}
#{36 }
Z(11) = 1
Z(12) = Z(21) = Z(22) = 2
Z(13) = Z(31) = Z(23) = Z(32) = Z(33) = 3
Z(14) = Z(24) = Z(34) = Z(44) = Z(41) = Z(42) = Z(43) = 4
Z(15) = Z(25) = Z(35) = Z(45) = Z(55) = Z(54) = Z(53) = Z(52)
= Z(51) = 5
Z(16) = Z(26) = Z(36) = Z(46) = Z(56) = Z(66) = Z(65) = Z(64)
= Z(63) = Z(62) = Z(61) = 6
on the set
#
$
pX (xi ) =df P(X = xi ) = P { | X() = xi ) }
1
= 0.125
8
3
= 0.375
8
3
pX (2) = P(X = 2) = P(T HH, HHT, HT H) = = 0.375
8
1
pX (3) = P(X = 3) = P(HHH) = = 0.125
8
pX (1) = P(X = 1) = P(T T H, HT T, T HT ) =
THEOREM 1
(i) P(X > x) = 1 P(X x).
(ii) P(x < X x ) = P(X x ) P(X x) = FX (x ) FX (x).
(iii) P(X = x) = P(X x) P(X < x).
Suppose a coin is tossed many times and the number of heads is systematically
recorded. It is possible to predict ahead of time the average number of heads.
The mathematical tool for calculating the mean or average of a random
variable X is the so-called expectation functional E. More generally, in studying
the probability distribution of a random variable, it is often useful to be able
to summarize a given aspect of the distribution by means of a single number
that then serves to measure that aspect. Statisticians call such a number a
paremeter of the distribution. In what follows we introduce two such parameters
of a probability distribution, namely the crucial expectation and variance.
Now we make the pertinent formal definition. Given a discrete random variable
X : R of a probability model , A, P,!the expected value
" of X (or simply
the expectation of X) with values X() = x1 , x2 , , xn is defined by the
weighted average (or center of gravity)
X = E(X) =df x1 pX (x1 ) + x2 pX (x2 ) + + xn pX (xn )
We see that the expectation of a random variable is a type of average value
of that random variable. It is a value about which the possible values of the
random variable are scattered. Note that the expectation of X is determined
by pX .
To illustrate the idea behind expectation, let us go back to Example 1. The
expected value of random variable X (introduced in Example 1) is given by
E(X) = 0
1
3
3
1
+ 1 + 2 + 3 = 1.5.
8
8
8
8
.
Although we think of 1.5 as an average value of X, it is clearly not a possible
value of X at all in X(8 ). It is, however, a value in a central location relative
to the possible values 0, 1, 2 and 3 of X. The number E(X) is often called a
measure of location. It is clear that it falls between the smallest value (namely
0) assumed by X and the largest value (namely 3) assumed by X. Thus, a
knowledge of E(X) gives a rough idea of the possible size of the possible values
of X. Finally, we might consider a beautiful analogy between the concept of
expectation and the concept of center of gravity in mechanics. If we imagine
7
masses of size pX (xi ) being placed at points xi on the line (with i = 1, 2, , n),
then E(X) is exactly what physicists call the center of gravity of this mass
distribution.
Suppose again that we are given a probability model , A, P of a random
experiment and a random variable X : R thereon. Let g : R R be
any (measurable) real-valued function.
% Then
& the composite
#
$ g(X) : R is
again a random variable, defined by g(X) () =df g X() for all in , with
expectation
#
$
E g(X) =df g(x1 ) pX (x1 ) + g(x2 ) pX (x2 ) + + g(xn ) pX (xn ).
' %
&
E (X )2 .
# $ %
&2
2
Since X
= Var(X) = E X 2 E(X) , the variance of X from Example 1
is given by Var(X) = 3 94 = 43 = 0.75, and hence the standard deviation is
8
'
X = 34 = 0.866. We see that the dispersion is quite large; about one head in
each trial.
The probability distribution function defined in Example 1 is a special case of
the so-called binomial distribution function, defined by
BSn (k) =df = P
( )
n
Sn = k =
pk (1 p)nk ,
k
$
1
36
2
36
3
36
4
36
5
36
1
36
2
36
4
36
5
36
6
36
3
36
#
$
It is easy to check that the sum of the values P X = xi of probabilities is 1.
Note that Y is not defined at sample points that fall outside the sample space
36 . The graph of the probability distribution function pY is shown in the figure
below:
10
#
$
pY (x) = P Y = x
0.4
0.3
0.2
0.1
1
10
11
12
91
6
49
4
35
12
35
12
35
6
= 5.8333.
with the event algebra A of all subsets of 36 and the Laplacean probability
measure
P(A) =
#{A}
#{36 }
1
36
3
36
5
36
7
36
pZ (5) = P(Z = 5) = P(15, 25, 35, 45, 55, 54, 53, 52, 51) =
9
36
pZ (6) = P(Z = 6) = P(16, 26, 36, 46, 56, 66, 65, 64, 63, 62, 61) =
12
11
36
FZ (x) =
36
36
if
x < 1
if
1 x < 2
if
2 x < 3
9
36
16
36
25
36
if
3 x < 4
if
4 x < 5
if
5 x; < 6
if
x 6
n! =df n (n 1) (n 2) 2 1,
read n factorial. For example, there are exactly 6 = 3! ordered arrangements of three events A, B, C, namely ABC, ACB, BAC, BCA, CBA, CAB.
The point is that one can put any of n objects in the first place, but then only
n 1 are left for the second place, and so forth, until only one is left for the
last place. By default, we set 0! = 0, and of course 1! = 1.
A slightly more interesting care arises when some of the objects are identical
or alike, as for example, the letters I, S, and P in MISSISSIPPI. In this
case, the number of permutations of n objects, where k1 are identical (or alike)
of type or kind one, k2 are identical (or alike) of type or kind two, and so on,
and km are identical of type m with k1 + k2 + + km = n is given by the
fraction
(
n
k1 , k2 , , km1
=df
n!
.
k1 ! k2 ! km !
Question: How many dierent permutations can be made from the letters of
the word MISSISSIPPI? Answer: Since n = 11 and there are 4 types in total
(type M, S, I and type P), we can define k1 = 4 for type S, k2 = 4 for type
I, k3 = 2 for type P , and k4 = 1 for type M . Therefore, in this example the
11!
number of permutations is 4!4!2!1!
= 34, 650.
The number of ordered arrangements of n objects using k (k n) objects at a
time is given by the formula
n!
,
(n k)!
and is called the permutation of n distinct objects taking k objects at a time. For
example, two letters from the set of three {A, B, C} can be arranged exactly in
3!
= 6 ways, namely AB, BA, AC, CA, BC, CB.
1!
Finally, the most important algorithm is the combination rule: The number of
ways of selecting k objects (k n) from a list of n distinct objects without
regard to order is
( )
n
n!
=df
.
k
k! (n k)!
For example, two letters from the set of three {A, B, C} can be selected without
3!
= 3 ways, namely AB, BC, AC.
regard to order exactly in 2!1!
14
15