Professional Documents
Culture Documents
on
PROBABILITY and STATISTICS
Eusebius Doedel
TABLE OF CONTENTS
SAMPLE SPACES
Events
The Algebra of Events
Axioms of Probability
Further Properties
Counting Outcomes
Permutations
Combinations
1
5
6
9
10
13
14
21
CONDITIONAL PROBABILITY
Independent Events
45
63
71
82
91
97
101
108
110
118
118
120
130
142
150
153
158
161
163
169
175
181
184
187
187
191
196
201
206
211
SAMPLE STATISTICS
The Sample Mean
The Sample Variance
Estimating the Variance of a Normal Distribution
Samples from Finite Populations
The Sample Correlation Coefficient
Maximum Likelihood Estimators
Hypothesis Testing
246
252
257
266
274
282
288
305
335
335
343
362
363
378
379
392
403
SAMPLE SPACES
DEFINITION :
The sample space is the set of all possible outcomes of an experiment.
EXAMPLE : When we flip a coin then sample space is
where
and
S = {H , T },
H denotes that the coin lands Heads up
T denotes that the coin lands Tails up.
EXAMPLE :
When we roll a fair die then the sample space is
S = {1, 2, 3, 4, 5, 6}.
The probability the die lands with k up is
1
6
, (k = 1, 2, , 6).
1
.
2
EXAMPLE :
When we toss a coin 3 times and record the results in the sequence
that they occur, then the sample space is
S = { HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } .
Elements of S are vectors , sequences , or ordered outcomes .
We may expect each of the 8 outcomes to be equally likely.
Thus the probability of the sequence HT T is
1
8
3
.
8
Events
In Probability Theory subsets of the sample space are called events.
EXAMPLE : The set of basic outcomes of rolling a die once is
S = {1, 2, 3, 4, 5, 6},
so the subset E = { 2 , 4 , 6 } is an example of an event.
If a die is rolled once and it lands with a 2 or a 4 or a 6 up then we
say that the event E has occurred.
We have already seen that the probability that E occurs is
P (E) =
1
1
1
+
+
6
6
6
5
1
.
2
the complement of E
the union of E and F
the intersection of E and F
We write E F if E is a subset of F .
REMARK : In Probability Theory we use
Ec
instead of E ,
EF
instead of E F ,
EF
instead of E F .
6
{a} , {c, d}
Axioms of Probability
A probability function P assigns a real number (the probability of E)
to every event E in a sample space S.
P () must satisfy the following basic properties :
0 P (E) 1 ,
P (S) = 1 ,
Further Properties
PROPERTY 1 :
P (E E c ) = P (E) + P (E c ) = 1 .
( Why ? )
Thus
P (E c ) = 1 P (E) .
EXAMPLE :
What is the probability of at least one H in four tosses of a coin?
SOLUTION : The sample space S will have 16 outcomes.
1
P (at least one H) = 1 P (no H) = 1
16
10
(Which?)
15
.
16
PROPERTY 2 :
P (E F ) = P (E) + P (F ) P (EF ) .
PROOF (using the third axiom) :
P (E F ) = P (EF ) + P (EF c ) + P (E c F )
= [P (EF ) + P (EF c )] + [P (EF ) + P (E c F )] P (EF )
= P(E) + P(F) - P(EF) .
( Why ? )
NOTE :
Draw a Venn diagram with E and F to see this !
The formula is similar to the one for the number of elements :
n(E F ) = n(E) + n(F ) n(EF ) .
11
Counting Outcomes
We have seen examples where the outcomes in a finite sample space
S are equally likely , i.e., they have the same probability .
Such sample spaces occur quite often.
Computing probabilities then requires counting all outcomes and
counting certain types of outcomes .
The counting has to be done carefully!
We will discuss a number of representative examples in detail.
Concepts that arise include permutations and combinations.
13
Permutations
EXAMPLE : Suppose that four-letter words of lower case alphabetic characters are generated randomly with equally likely outcomes.
(Assume that letters may appear repeatedly.)
(a) How many four-letter words are there in the sample space S ?
SOLUTION :
(b) How many four-letter words are there are there in S that start
with the letter s ?
SOLUTION :
263 .
1
263
=
= 0.038 .
4
26
26
Could this have been computed more easily?
15
2
6
1
.
3
1
.
n
16
( Why ? )
n!
(n k)!
( Why ? )
17
5 4 3 = 60 .
( Why ? )
(c) Suppose the 60 solutions in the sample space are equally likely .
What is the probability of generating a three-letter word that
contains the letters a and b ?
SOLUTION :
18
60
= 0.3 .
19
EXERCISE :
Suppose the sample space S consists of all five-letter words
having distinct alphabetic characters .
20
Combinations
Let S be a set containing n (distinct) elements.
Then
a combination of k elements from S ,
is
any selection of k elements from S ,
where order is not important .
(Thus the selection is a set .)
EXAMPLE :
There are three combinations of 2 elements chosen from the set
S = {a , b , c} ,
namely, the subsets
{a, b} , {a, c} , {b, c} ,
ac , ca ,
22
bc , cb .
In general, given
a set S of n elements ,
.
k
k! (n k)!
n
REMARK : The notation
is referred to as
k
n choose k .
NOTE :
n
n
n!
n! (n n)!
n!
n! 0!
= 1,
PROOF :
First recall that there are
n (n 1) (n 2) (n k + 1) =
n!
(n k)!
24
n
.
k
EXAMPLE :
In the previous example, with 2 elements chosen from the set
{a , b , c} ,
we have n = 3 and k = 2 , so that there are
3!
(3 2)!
= 6 words ,
namely
ab , ba ,
while there are
3
2
ac , ca ,
3!
2! (3 2)!
6
2
bc , cb ,
= 3 subsets ,
namely
{a, b} , {a, c} , {b, c} .
25
4!
= 24 words, namely :
(4 3)!
abc
acb
bac
bca
cab
cba
,
,
,
,
,
,
abd
adb
bad
bda
dab
dba
,
,
,
,
,
,
acd
adc
cad
cda
dac
dca
,
,
,
,
,
,
bcd
bdc
cbd
cdb
dbc
dcb
,
,
,
,
,
,
4!
24
=
= 4 subsets ,
3! (4 3)!
6
EXAMPLE :
(a) How many ways are there to choose a committee of 4 persons
from a group of 10 persons, if order is not important?
SOLUTION :
10
4
10!
4! (10 4)!
= 210 .
(b) If each of these 210 outcomes is equally likely then what is the
probability that a particular person is on the committee?
SOLUTION :
84
4
10
9
/
=
=
.
3
4
210
10
Is this result surprising?
27
( Why ? )
6
=
.
10
( Why ? )
9!
= 10
3! (9 3)!
= 840 .
EXAMPLE : ( continued )
(Two balls are selected at random from a bag with four white balls
and three black balls.)
What is the probability that both balls are white?
SOLUTION :
2
6
4
7
= .
/
=
2
2
21
7
What is the probability that both balls are black?
SOLUTION :
1
3
3
7
= .
/
=
2
2
21
7
What is the probability that one is white and one is black?
SOLUTION :
43
4
4
3
7
/
=
= .
1
1
2
21
7
(Could this have been computed differently?)
30
EXAMPLE : ( continued )
In detail, the sample space S is
n
{w1 , w2 },
{w1 , w3 },
{w2 , w3 },
{w1 , w4 },
{w2 , w4 },
{w3 , w4 },
|
|
|
{w1 , b1 },
{w2 , b1 },
{w3 , b1 },
{w4 , b1 },
{w1 , b2 },
{w2 , b2 },
{w3 , b2 },
{w4 , b2 },
{b1 , b2 },
{w1 , b3 },
{w2 , b3 },
{w3 , b3 },
{w4 , b3 },
{b1 , b3 },
{b2 , b3 }
1
.
21
EXERCISE :
Three balls are selected at random from a bag containing
2 red , 3 green , 4 blue balls .
What would be an appropriate sample space S ?
What is the the number of outcomes in S ?
What is the probability that all three balls are red ?
What is the probability that all three balls are green ?
What is the probability that all three balls are blue ?
What is the probability of one red, one green, and one blue ball ?
32
= 3 3 = 9.
( Why ? )
2
2
2! 2
2! 2
Thus the probability each pair is of the same color is 9/105 = 3/35 .
33
EXAMPLE : ( continued )
n
n
n
n
34
,
)
EXERCISE :
EXERCISE :
Two balls are selected at random from a bag with three white balls
and two black balls.
Show all elements of a suitable sample space.
What is the probability that both balls are white?
35
EXERCISE :
EXAMPLE :
How many nonnegative integer solutions are there to
x1 + x2 + x3 = 17 ?
SOLUTION :
Consider seventeen 1s separated by bars to indicate the possible
values of x1 , x2 , and x3 , e.g.,
111|111111111|11111 .
The total number of positions in the display is 17 + 2 = 19 .
The total number of nonnegative solutions is now seen to be
19!
19 18
19
=
=
= 171 .
2
(19 2)! 2!
2
37
EXAMPLE :
How many nonnegative integer solutions are there to the inequality
x1 + x2 + x3 17 ?
SOLUTION :
Introduce an auxiliary variable (or slack variable )
x4 17 (x1 + x2 + x3 ) .
Then
x1 + x2 + x3 + x4 = 17 .
Use seventeen 1s separated by 3 bars to indicate the possible values
of x1 , x2 , x3 , and x4 , e.g.,
111|11111111|1111|11 .
38
111|11111111|1111|11 .
The total number of positions is
17 + 3 = 20 .
20
3
20!
(20 3)! 3!
39
20 19 18
32
= 1140 .
EXAMPLE :
How many positive integer solutions are there to the equation
x1 + x2 + x3 = 17 ?
SOLUTION : Let
x1 = x1 + 1 ,
x2 = x2 + 1 ,
x3 = x3 + 1 .
16!
(16 2)! 2!
40
16 15
2
= 120 .
EXAMPLE :
What is the probability the sum is 9 in three rolls of a die ?
SOLUTION : The number of such sequences of three rolls with
sum 9 is the number of integer solutions of
x1 + x2 + x3 = 9 ,
with
1 x1 6
1 x2 6
1 x3 6 .
x2 = x2 + 1 ,
x3 = x3 + 1 .
Let
x1 = x1 + 1 ,
x1 + x2 + x3 = 6 ,
0 x1 , x2 , x3 5 .
41
EXAMPLE : ( continued )
Now the equation
( 0 x1 , x2 , x3 5 ) ,
x1 + x2 + x3 = 6 ,
has
1|111|11
8
2
= 28 solutions ,
(0, 6, 0)
(0, 0, 6) .
111111|| ,
|111111|
||111111
25
216
42
= 0.116 .
EXAMPLE : ( continued )
The 25 outcomes of the event the sum of the rolls is 9 are
{
126 ,
135 ,
144 ,
153 ,
162 ,
216 ,
225 ,
234 ,
243 ,
252 ,
315
324 ,
333 ,
342 ,
351 ,
414 ,
423 ,
432 ,
441 ,
513 ,
522 ,
531 ,
612 ,
621
}.
261 ,
EXERCISE :
How many integer solutions are there to the inequality
x1 + x2 + x3 17 ,
if we require that
x1 1 ,
x2 2
x3 3 ?
EXERCISE :
What is the probability that the sum is less than or equal to 9
in three rolls of a die ?
44
CONDITIONAL PROBABILITY
Giving more information can change the probability of an event.
EXAMPLE :
If a coin is tossed two times then what is the probability of two
Heads?
ANSWER :
1
.
4
EXAMPLE :
If a coin is tossed two times then what is the probability of two Heads,
given that the first toss gave Heads ?
ANSWER :
1
.
2
45
NOTE :
Several examples will be about playing cards .
A standard deck of playing cards consists of 52 cards :
Four suits :
Hearts , Diamonds (red ) , and Spades , Clubs (black) .
Each suit has 13 cards, whose denomination is
2 , 3 , , 10 , Jack , Queen , King , Ace .
The Jack , Queen , and King are called face cards .
46
EXERCISE :
Suppose we draw a card from a shuffled set of 52 playing cards.
defined as
P (E|F )
P (EF )
.
P (F )
or, equivalently
P (EF ) = P (E|F ) P (F ) ,
(assuming that P (F ) is not zero).
48
P (E|F )
P (EF )
P (F )
E
S
E
F
P (E|F )
E
P (EF )
P (F )
S
F
E
= {HH , HT , T H , T T } .
= {HH , HT } .
= {HH} = E
( since E F ) .
We have
P (E|F ) =
P (EF )
P (F )
=
51
P (E)
P (F )
1
4
2
4
1
.
2
EXAMPLE :
Suppose we draw a card from a shuffled set of 52 playing cards.
What is the probability of drawing a Queen, given that the card
drawn is of suit Hearts ?
ANSWER :
P (Q|H) =
P (QH)
P (H)
1
52
13
52
1
.
13
P (QF )
P (F )
P (Q)
P (F )
(Here Q F , so that QF = Q .)
52
4
52
12
52
1
.
3
and
= P ( E(F F c ) )
( Why ? )
= P ( EF EF c ) = P (EF ) + P (EF c )
P (EF ) = P (E|F ) P (F ) ,
( Why ? )
P (EF c ) = P (E|F c ) P (F c ) ,
EXAMPLE :
An insurance company has these data :
The probability of an insurance claim in a period of one year is
4 percent for persons under age 30
2 percent for persons over age 30
and it is known that
30 percent of the targeted population is under age 30.
What is the probability of an insurance claim in a period of one year
for a randomly chosen person from the targeted population?
54
SOLUTION :
Let the sample space S be all persons under consideration.
Let C be the event (subset of S) of persons filing a claim.
Let U be the event (subset of S) of persons under age 30.
Then U c is the event (subset of S) of persons over age 30.
Thus
P (C) = P (C|U ) P (U ) + P (C|U c ) P (U c )
4 3
2 7
=
+
100 10
100 10
26
=
= 2.6% .
1000
55
EXAMPLE :
Two balls are drawn from a bag with 2 white and 3 black balls.
There are 20 outcomes (sequences) in S .
( Why ? )
EXAMPLE : ( continued )
Is it surprising that P (S) = P (F ) ?
ANSWER : Not really, if one considers the sample space S :
n
w 1 w2 ,
w1 b 1 ,
w1 b 2 ,
w1 b 3 ,
w2 w1 ,
w2 b 1 ,
w2 b 2 ,
w2 b 3 ,
b 1 w1 ,
b 1 w2 ,
b1 b2 ,
b1 b3 ,
b 2 w1 ,
b 2 w2 ,
b2 b1 ,
b2 b3 ,
b 3 w1 ,
b 3 w2 ,
b3 b1 ,
b3 b2
EXAMPLE :
Suppose we draw two cards from a shuffled set of 52 playing cards.
What is the probability that the second card is a Queen ?
ANSWER :
P (2nd card Q) =
P (2nd card Q|1st card Q) P (1st card Q)
+ P (2nd card Q|1st card not Q) P (1st card not Q)
=
3 4
4 48
51 52
51 52
204
51 52
4
52
1
.
13
P (EF )
P (E)
P (E|F ) P (F )
,
P (E)
P (E|F ) P (F )
,
P (E|F ) P (F ) + P (E|F c ) P (F c )
P (+|D) = 0.99 ,
P (+|H) = 0.05 .
By Bayes formula
P (D|+) =
=
P (+|D) P (D)
P (+|D) P (D) + P (+|H) P (H)
0.99 0.001
0.99 0.001 + 0.05 0.999
60
= 0.0194
(!)
EXERCISE :
Suppose 1 in 100 products has a certain defect.
A test detects the defect in 95 % of defective products.
The test also detects the defect in 10 % of non-defective products.
With what probability does a positive test diagnose a defect?
EXERCISE :
Suppose 1 in 2000 persons has a certain disease.
A test detects the disease in 90 % of diseased persons.
The test also detects the disease in 5 % of healthly persons.
With what probability does a positive test diagnose the disease?
61
= F1 F2 Fn ,
P (E|Fi ) P (Fi )
P (Fi |E) =
.
P (E|F1 ) P (F1 ) + P (E|F2 ) P (F2 ) + + P (E|Fn ) P (Fn )
EXERCISE :
Machines M1 , M2 , M3 produce these proportions of a article
Production :
M1 : 10 % ,
M2 : 30 % ,
M3 : 60 % .
M1 : 4 % ,
M2 : 3 % ,
M3 : 2 % .
Independent Events
Two events E and F are independent if
P (EF ) = P (E) P (F ) .
In this case
P (E|F ) =
P (EF )
P (F )
P (E) P (F )
P (F )
= P (E) ,
63
12
52
3
13
P (Hearts)
13
52
1
4
3
52
P (Face Card|Hearts)
3
13
,
,
We see that
3
P (Face Card and Hearts) = P (Face Card) P (Hearts) (=
).
52
Thus the events Face Card and Hearts are independent.
Therefore we also have
3
P (Face Card|Hearts) = P (Face Card) (=
).
13
64
EXERCISE :
65
Y ( {i, j} ) = |i j| .
(More soon!)
E = E(F F c ) = EF EF c , where
Thus
EF
and
EF c
are disjoint .
= P (E) P (EF )
= P (E) P (E) P (F )
= P (E) ( 1 P (F ) )
= P (E) P (F c ) .
EXERCISE :
Prove that if E and F are independent then so are E c and F c .
67
E
F
disjoint
then P (EF ) = P ( ) = 0 .
If E and F are independent and disjoint then one has zero probability !
68
EXERCISE :
A machine M consists of three independent parts, M1 , M2 , and M3 .
Suppose that
M1 functions properly with probability
9
10
9
10
8
10
and that
the machine M functions if and only if its three parts function.
What is the probability for the machine M to function ?
What is the probability for the machine M to malfunction ?
70
R.
e.g., Y (T T H) = 3 ,
instead of
72
X(s) .
and
Thus
we have
P (0 < X 2) =
6
.
8
we have
where
pX (0)
P ( {T T T } )
1
8
pX (1)
P ( {HT T , T HT , T T H} )
3
8
pX (2)
P ( {HHT , HT H , T HH} )
3
8
pX (3)
P ( {HHH} )
1
8
( Why ? )
E3
S
E2
X(s)
HHH
3
HHT
HTH
THH
HTT
E1
1
THT
TTH
E0
0
TTT
Graphical representation of X .
The events E0 , E1 , E2 , E3 are disjoint since X(s) is a function !
(X : S R must be defined for all s S and must be single-valued.)
75
The graph of pX .
76
DEFINITION :
pX (x) P (X = x) ,
FX (x) P (X x) ,
( Why ? )
FX () = 0 and FX () = 1 .
( Why ? )
( Why ? )
and
77
1
8
p(1) =
3
8
p(2) =
3
8
1
8
p(3) =
P (X 1)
P (X 0)
P (X 1)
P (X 2)
1
8
4
8
7
8
P (X 4)
P (X 3)
7
8
1
8
6
8
= {H , T H , T T H , T T T H , } .
1
2
X(T H) = 2 ,
p(2) =
and
F (n) = P (X n) =
1
4
, p(3) = 18 ,
n
n
X
X
1
p(k) =
k
2
k=1
k=1
n
X
X
p(k) = lim
p(k) =
k=1
X(T T H) = 3
k=1
( Why ? )
1
= 1 n ,
2
1
lim (1 n ) = 1 .
n
2
, HHHT
, HHT
H , HHT
T ,
S4 = { HHHH
HH ,
HT
HT ,
HT
TH ,
HT
TT ,
HT
,
T HHH
T HHT
,
H ,
T HT
T ,
T HT
,
T T HH
,
T T HT
,
TTTH
TTTT
}.
.
where for each outcome the first Heads is marked as H
Each outcome in S4 has equal probability 2n (here 24 =
pX (1) =
1
2
pX (2) =
1
4
pX (3) =
independent of n .
81
1
8
pX (4) =
1
)
16
1
16
, and
Joint distributions
The probability mass function and the probability distribution function
can also be functions of more than one variable.
EXAMPLE : Toss a coin 3 times in sequence. For the sample space
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
we let
(0 for T T T ) .
pX,Y (x, y) = P (X = x , Y = y) .
pX,Y (2, 1) = P (X = 2 , Y = 1)
= P ( 2 Heads , 1st toss is Heads)
=
2
8
=
82
1
4
EXAMPLE : ( continued )
For
S = {HHH , HHT , HT H , HT T , T HH , T HT , T T H , T T T } ,
X(s) = number of Heads, and Y (s) = index of the first H ,
we can list the values of pX,Y (x, y) :
Joint probability mass function pX,Y (x, y)
x=0
x=1
x=2
x=3
pY (y)
y=0
y=1
y=2
y=3
pX (x)
1
8
0
0
0
1
8
2
8
1
8
4
8
1
8
1
8
1
8
0
0
1
8
3
8
3
8
1
8
2
8
1
8
1
8
NOTE :
The marginal probability pX is the probability mass function of X.
The marginal probability pY is the probability mass function of Y .
83
EXAMPLE : ( continued )
X(s) = number of Heads, and Y (s) = index of the first H .
x=0
x=1
x=2
x=3
pY (y)
y=0
y=1
y=2
y=3
pX (x)
1
8
0
0
0
1
8
2
8
1
8
4
8
1
8
1
8
1
8
0
0
1
8
3
8
3
8
1
8
2
8
1
8
1
8
For example,
X = 2 corresponds to the event {HHT , HT H , T HH} .
Y = 1 corresponds to the event {HHH , HHT , HT H , HT T } .
(X = 2 and Y = 1) corresponds to the event {HHT , HT H} .
QUESTION : Are the events X = 2 and Y = 1 independent ?
84
X(s)
Y(s)
E 31
HHH
E 13
3
TTH
E11
HTT
E12 THT
E 21
HTH
2
HHT
E 22 THH
E 00
0
TTT
DEFINITION :
pX,Y (x, y)
P (X = x , Y = y) ,
DEFINITION :
FX,Y (x, y)
P (X x , Y y) ,
y=0
y=1
y=2
y=3
pX (x)
1
8
0
0
0
1
8
2
8
1
8
4
8
1
8
1
8
1
8
0
0
1
8
3
8
3
8
1
8
2
8
1
8
1
8
y=0
y=1
y=2
y=3
FX ()
1
8
1
8
1
8
1
8
1
8
1
8
2
8
4
8
5
8
5
8
1
8
3
8
6
8
7
8
7
8
1
8
4
8
7
8
1
8
4
8
7
8
y=0
y=1
y=2
y=3
pX (x)
1
8
0
0
0
1
8
2
8
1
8
4
8
1
8
1
8
1
8
0
0
1
8
3
8
3
8
1
8
2
8
1
8
1
8
y=0
y=1
y=2
y=3
FX ()
1
8
1
8
1
8
1
8
1
8
1
8
2
8
4
8
5
8
5
8
1
8
3
8
6
8
7
8
7
8
1
8
4
8
7
8
1
8
4
8
7
8
QUESTION : Why is
P (1 < X 3 , 1 < Y 3) = F (3, 3) F (1, 3) F (3, 1) + F (1, 1) ?
88
EXERCISE :
Roll a four-sided die (tetrahedron) two times.
(The sides are marked 1 , 2 , 3 , 4 .)
Suppose each of the four sides is equally likely to end facing down.
Suppose the outcome of a single roll is the side that faces down ( ! ).
Define the random variables X and Y as
X = result of the first roll
EXERCISE :
Three balls are selected at random from a bag containing
2 red , 3 green , 4 blue balls .
Define the random variables
R(s) = the number of red balls drawn,
and
G(s) = the number of green balls drawn .
List the values of
the joint probability mass function pR,G (r, g) .
the marginal probability mass functions pR (r) and pG (g) .
the joint distribution function FR,G (r, g) .
the marginal distribution functions FR (r) and FG (g) .
90
NOTE :
P (Ex ) P (Ey ) ,
X 1 ({x}) { s S : X(s) = x } .
91
X(s)
Y(s)
E 31
HHH
E 13
3
TTH
E11
HTT
E12 THT
E 21
HTH
2
HHT
E 22 THH
E 00
0
TTT
RECALL :
X(s) and Y (s) are independent if for all x and y :
pX,Y (x, y) = pX (x) pY (y) .
EXERCISE :
Roll a die two times in a row.
Let
X be the result of the 1st roll ,
and
Y the result of the 2nd roll .
Are X and Y independent , i.e., is
pX,Y (k, ) = pX (k) pY (),
for all 1 k, 6 ?
93
EXERCISE :
Are these random variables X and Y independent ?
1
8
0
0
0
1
8
2
8
1
8
4
8
1
8
1
8
1
8
1
8
94
pX (x)
0
0
1
8
3
8
3
8
1
8
2
8
1
8
PROPERTY :
The joint distribution function of independent random variables
X and Y satisfies
FX,Y (x, y) = FX (x) FY (y) ,
for all x, y .
PROOF :
FX,Y (xk , y ) =
=
=
=
=
=
P (X xk , Y y )
P
P
{
ik
ik
ik
ik
pX,Y (xi , yj )
pX (xi ) pY (yj )
{ pX (xi )
pX (xi ) } {
FX (xk ) FY (y ) .
96
(by independence)
pY (yj ) }
j
pY (yj ) }
Conditional distributions
Let X and Y be discrete random variables with joint probability
mass function
pX,Y (x, y) .
For given x and y , let
Ex = X 1 ({x})
and
Ey = Y 1 ({y}) ,
P (Ex Ey )
P (Ey )
pX,Y (x, y)
.
pY (y)
X(s)
Y(s)
E 31
HHH
E 13
3
TTH
E11
HTT
E12 THT
E 21
HTH
2
HHT
E 22 THH
E 00
0
TTT
pX,Y (x,y)
pY (y)
pX,Y (x,y)
pX (x)
EXAMPLE :
Joint probability mass function pX,Y (x, y)
y = 1 y = 2 y = 3 pX (x)
1
1
1
1
x=1
3
12
12
2
1
1
2
1
x=2
9
18
18
3
1
1
1
1
x=3
9
36
36
6
2
1
1
pY (y)
1
3
6
6
Conditional probability mass function pX|Y (x|y) =
y=1 y=2 y=3
1
1
1
x=1
2
2
2
1
1
1
x=2
3
3
3
1
1
1
x=3
6
6
6
1
pX,Y (x,y)
pY (y)
Expectation
The expected value of a discrete random variable X is
E[X]
X
k
xk P (X = xk )
X
k
xk pX (xk ) .
E[aX] = a E[X] ,
E[aX + b] = a E[X] + b .
101
7
2
= {H , T H , T T H , T T T H , } .
X(T H) = 2
X(T T H) = 3 .
Then
1
1
1
E[X] = 1 + 2 + 3 + =
2
4
8
Pn
k
k/2
n
k=1
1 0.50000000
2 1.00000000
3 1.37500000
10 1.98828125
40 1.99999999
n
X
k
lim
k
n
2
k=1
REMARK :
Perhaps using Sn = {all sequences of n tosses} is better
102
= 2.
g(xk ) p(xk ) .
EXAMPLE :
The pay-off of rolling a die is $k2 , where k is the side facing up.
What should the entry fee be for the betting to break even?
6
X
k=1
1
k
6
2
1 6(6 + 1)(2 6 + 1)
6
6
103
91
6
= $15.17 .
EXAMPLE :
x=1
x=2
x=3
pY (y)
1
3
2
9
1
9
2
3
1
12
1
18
1
36
1
6
1
2
1
3
1
6
1
12
1
18
1
36
1
6
E[Y ]
= 1
1
2
2
3
E[XY ]
= 1
1
3
+ 2
1
12
+ 3
1
12
+ 2
2
9
+ 4
1
18
+ 6
1
18
+ 3
1
9
+ 6
1
36
+ 9
1
36
+ 2
+ 2
1
3
1
6
= 1
E[X]
104
pX (x)
+ 3
+ 3
1
6
1
6
=
=
5
3
3
2
5
2
,
,
( So ? )
PROPERTY :
PROOF :
E[XY ]
=
=
=
P P
k
P P
k
= {
k{
xk y pX,Y (xk , y )
xk y pX (xk ) pY (y )
xk pX (xk )
xk pX (xk )} {
= E[X] E[Y ] .
(by independence)
y pY (y )}
P
y pY (y )}
( Always ! )
PROOF :
E[X + Y ]
=
=
=
=
=
k {xk
k {xk
pX,Y (xk , y )} +
pX (xk )} +
= E[X] + E[Y ] .
{y
P P
y pX,Y (xk , y )
y pX,Y (xk , y )
pY (y )}
pX,Y (xk , y )}
EXERCISE :
Show that
E[X] = 2 ,
E[Y ] = 8 ,
E[XY ] = 16
Thus if
E[XY ] = E[X] E[Y ] ,
then it does not necessarily follow that X and Y are independent !
107
V ar(X) E[ (X ) ]
X
k
(xk )2 p(xk ) ,
= E[X 2 2X + 2 ]
= E[X 2 ] 2E[X] + 2
= E[X 2 ] 22 + 2
= E[X 2 ] 2 .
108
p
p
V ar(X) =
E[ (X )2 ] =
E[X 2 ] 2 .
V ar(X) =
k=1
[k2
1
] 2
6
1 6(6 + 1)(2 6 + 1)
7 2
35
=
( ) =
.
6
6
2
12
The standard deviation is
35
12
109
= 1.70 .
Covariance
Let X and Y be random variables with mean
E[X] = X
E[Y ] = Y .
We have
Cov(X, Y )
= E[ (X X ) (Y Y ) ]
= E[XY X Y Y X + X Y ]
= E[XY ] X Y Y X + X Y
= E[XY ] E[X] E[Y ] .
110
We defined
Cov(X, Y ) E[ (X X ) (Y Y ) ]
=
X
k,
NOTE :
(xk X ) (y Y ) p(xk , y )
PROPERTY :
If X and Y are independent then Cov(X, Y ) = 0 .
PROOF :
We have already shown ( with X E[X] and Y E[Y ] ) that
Cov(X, Y ) E[ (X X ) (Y Y ) ] = E[XY ] E[X] E[Y ] ,
and that if X and Y are independent then
E[XY ] = E[X] E[Y ] .
from which the result follows.
113
E[X] = 2 ,
E[Y ] = 8 ,
E[XY ] = 16
Thus if
Cov(X, Y ) = 0 ,
then it does not necessarily follow that X and Y are independent !
114
PROPERTY :
If X and Y are independent then
V ar(X + Y ) = V ar(X) + V ar(Y ) .
PROOF :
We have already shown (in an exercise !) that
V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 Cov(X, Y ) ,
and that if X and Y are independent then
Cov(X, Y ) = 0 ,
from which the result follows.
115
EXERCISE :
Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]
E[XY ] , V ar(X) , V ar(Y )
for
Cov(X, Y )
Joint probability mass function pX,Y (x, y)
y = 0 y = 1 y = 2 y = 3 pX (x)
1
1
x=0
0
0
0
8
8
3
1
1
1
x=1
0
8
8
8
8
1
2
3
0
x=2
0
8
8
8
1
1
x=3
0
0
0
8
8
1
4
2
1
pY (y)
1
8
8
8
8
116
EXERCISE :
Compute
E[X] , E[Y ] , E[X 2 ] , E[Y 2 ]
E[XY ] , V ar(X) , V ar(Y )
Cov(X, Y )
for
Joint probability mass function pX,Y (x, y)
y=1 y=2 y=3
x=1
x=2
x=3
pY (y)
1
3
2
9
1
9
2
3
1
12
1
18
1
36
1
6
117
1
12
1
18
1
36
1
6
pX (x)
1
2
1
3
1
6
= p,
P (X = 0)
= 1p ,
E[X] = 1 p + 0 (1 p) = p ,
E[X 2 ] = 12 p + 02 (1 p) = p ,
EXAMPLES :
When p =
1
2
E[X] = p =
1
2
V ar(X) = p(1 p) =
1
4
1
6
V ar(X) = p(1 p) =
5
36
V ar(X) = 0.0099
= 0.01 .
119
(n = 12) ,
with probability
P (100011001010)
p5 (1 p)7 .
( Why ? )
X(100011001010) = 5 .
P (X = 5) =
12
5
p5 (1 p)7 .
120
( Why ? )
pX (k)
1 / 4096
12 / 4096
66 / 4096
220 / 4096
495 / 4096
792 / 4096
924 / 4096
792 / 4096
495 / 4096
220 / 4096
66 / 4096
12 / 4096
1 / 4096
121
1
2
FX (k)
1 / 4096
13 / 4096
79 / 4096
299 / 4096
794 / 4096
1586 / 4096
2510 / 4096
3302 / 4096
3797 / 4096
4017 / 4096
4083 / 4096
4095 / 4096
4096 / 4096
122
1
2
pX (k)
0.1121566221
0.2691758871
0.2960935235
0.1973956972
0.0888280571
0.0284249838
0.0066324966
0.0011369995
0.0001421249
0.0000126333
0.0000007580
0.0000000276
0.0000000005
123
1
6
FX (k)
0.112156
0.381332
0.677426
0.874821
0.963649
0.992074
0.998707
0.999844
0.999986
0.999998
0.999999
0.999999
1.000000
(0 k n) .
124
1
6
EXAMPLE :
In 12 rolls of a die write the outcome as, for example,
where
and
100011001010
1 denotes the roll resulted in a six ,
0 denotes the roll did not result in a six .
,
125
P (X 5)
= 99.2 % .
ck =
nk
p
k+1 1p
k = 0, 1, , n 1 .
n
X
k=0
k P (X = k) =
n
X
k=0
n
k
pk (1 p)nk ,
k
(Xk = 0 or 1 ) ,
then
X
X1 + X2 + + Xn ,
X1 + X2 + + Xn
We know that
E[Xk ] = p ,
so
EXAMPLES :
k
nk
,
P (X = k) =
p (1 p)
= e
k
k!
when we take
= n p ( the average number of successes ) .
This approximation is accurate if n is large and p small .
Recall that for the Binomial random variable
E[X] = n p , and V ar(X) = np(1 p)
= np when p is small.
Indeed, for the Poisson random variable we will show that
E[X] = and V ar(X) = .
130
P (X = k + 1)
,
k!
=
k = 0, 1, 2, ,
k+1
,
(k + 1)!
P (X = k) ,
k+1
k = 0, 1, 2, .
,
k!
E[X] =
k = 0, 1, 2, ,
and
V ar(X) = .
k
X
=0
= e
k
X
,
!
=0
!
=0
= e e = 1 .
,
k = 0, 1, 2, ,
k!
models the probability of k successes in a given time interval,
60
0!
61
1!
62
2!
= 0.0024 ,
= 0.0148 ,
= 0.0446 .
n
pBinomial (k) =
pk (1 p)nk
k
pPoisson (k) = e
k
k!
EXAMPLE : = 6 customers/hour.
For the Binomial take n = 12 , p = 0.5 (0.5 customers/5 minutes) ,
so that indeed np = .
k
0
1
2
3
4
5
6
7
8
9
10
11
12
pBinomial
0.0002
0.0029
0.0161
0.0537
0.1208
0.1933
0.2255
0.1933
0.1208
0.0537
0.0161
0.0029
0.0002
pPoisson
0.0024
0.0148
0.0446
0.0892
0.1338
0.1606
0.1606
0.1376
0.1032
0.0688
0.0413
0.0225
0.0112
FBinomial
0.0002
0.0031
0.0192
0.0729
0.1938
0.3872
0.6127
0.8061
0.9270
0.9807
0.9968
0.9997
1.0000
FPoisson
0.0024
0.0173
0.0619
0.1512
0.2850
0.4456
0.6063
0.7439
0.8472
0.9160
0.9573
0.9799
0.9911
n
pBinomial (k) =
pk (1 p)nk
k
pPoisson (k) = e
k
k!
EXAMPLE : = 6 customers/hour.
For the Binomial take n = 60 , p = 0.1 (0.1 customers/minute) ,
so that indeed np = .
k
0
1
2
3
4
5
6
7
8
9
10
11
12
13
pBinomial
0.0017
0.0119
0.0392
0.0843
0.1335
0.1662
0.1692
0.1451
0.1068
0.0685
0.0388
0.0196
0.0089
pPoisson
0.0024
0.0148
0.0446
0.0892
0.1338
0.1606
0.1606
0.1376
0.1032
0.0688
0.0413
0.0225
0.0112
FBinomial
0.0017
0.0137
0.0530
0.1373
0.2709
0.4371
0.6064
0.7515
0.8583
0.9269
0.9657
0.9854
0.9943
FPoisson
0.0024
0.0173
0.0619
0.1512
0.2850
0.4456
0.6063
0.7439
0.8472
0.9160
0.9573
0.9799
0.9911
n = 12 , p =
1
2
, =6
n = 200 , p = 0.01 , = 2
V ar(X) = np(1 p) ,
and
and
V ar(X) = np .
= np ,
when p is small .
n=60 , p=0.1
Binomial
6.0000
3.0000
1.7321
Binomial
6.0000
5.4000
2.3238
E[X]
V ar[X]
[X]
Poisson
6.0000
6.0000
2.4495
E[X]
V ar[X]
[X]
137
Poisson
6.0000
6.0000
2.4495
i
3
3
2
2
t
X
t
X
+
+
E[etX ] = E 1 + tX +
2!
3!
3
t
t2
E[X 2 ] +
E[X 3 ] + .
= 1 + t E[X] +
2!
3!
h
It follows that
(0) = E[X]
(0) = E[X 2 ] .
V ar(X)
=
=
E[X] ,
E[X 2 ] 2 .
138
( Why ? )
k=0
etk P (X = k) =
X
k=0
Here
etk e
k=0
(et )k
k!
= e e
et
k
k!
t 2
(t) = (e ) + e
(et 1)
( Check ! )
E[X] = (0) =
E[X 2 ] = (0) = ( + 1) = 2 +
V ar(X) = E[X 2 ] E[X]2 = .
139
= e(e 1) .
(t) = et e(e 1)
so that
k
,
k!
k = 0, 1, 2, .
EXERCISE :
Defects in a certain wire occur at the rate of one per 10 meter.
Assume the defects have a Poisson distribution.
What is the probability that :
EXERCISE :
Customers arrive at a counter at the rate of 8 per hour.
Assume the arrivals have a Poisson distribution.
What is the probability that :
R.
2 , where (0, 1] .
1
)
2
, P ( 13 <
142
1
)
2
, P ( =
1 )
2
Thus
FX (x) P (X x) .
FX (b) FX (a)
P (a < X b) .
and
FX () = 1 ,
We must have
FX () = 0
i.e.,
lim
and
FX (x) = 0 ,
lim FX (x) = 1 .
( Why ? )
NOTE : All the above is the same as for discrete random variables !
143
1/2
1/3
theta
0
1/3
1/2
Note that
F ( 31 ) P (X 13 ) =
1
3
, F ( 21 ) P (X 12 ) =
P ( 31 < X 12 ) = F ( 12 ) F ( 13 ) =
QUESTION : What is P ( 31 X 21 ) ?
144
1
2
1
3
1
6
1
2
fX (x) FX (x)
FX (x) .
dx
EXAMPLE : In the pointer example
0, x0
x, 0<x1
FX (x) =
1, 1<x
Thus
fX (x) = FX (x) =
0, x0
1, 0<x1
0, 1<x
and
145
EXAMPLE : ( continued )
0, x0
x, 0<x1
F (x) =
,
1, 1<x
F(theta)
0, x0
1, 0<x1
f (x) =
0, 1<x
f(theta)
1/2
1/3
theta
theta
0
1/3
1/2
Distribution function
NOTE :
1
1
P( < X ) =
3
2
1/3
1/2
Density function
1
2
f (x) dx =
1
3
146
1
6
In general, from
f (x)
with
F () = 0
and
F (x) ,
F () = 1 ,
f (x) dx =
F (x) dx = F () F () = 1 ,
0,
x0
1 ex , x > 0
f (x) =
0,
x0
ex , x > 0
F () = 0 ,
f (x) = F (x) ,
F (x) =
Rx
0
F () = 1 ,
f (x) dx ,
f (x) dx = 1 ,
P (X > 1) = 1 F (1) = e1
= 0.37 ,
0,
otherwise
Determine P (0 X 21 ) in terms of n .
9
Determine P ( 10
X 1) in terms of n .
9
What happens to P ( 10
X 1) when n becomes large?
149
Joint distributions
A joint probability density function fX,Y (x, y) must satisfy
Z Z
fX,Y (x, y) dx dy = 1
( Volume = 1 ).
By Calculus we have
2 FX,Y (x, y)
xy
Also,
P (a < X b , c < Y d) =
150
= fX,Y (x, y) .
fX,Y (x, y) dx dy .
a
EXAMPLE :
If
fX,Y (x, y) =
otherwise ,
y
0
1 dx dy = xy .
Thus
FX,Y (x, y)
xy ,
For example
1
1
P( X , Y )
3
2
151
1 1
FX,Y ( ,
)
3 2
1
.
6
0.9
1.5
0.8
0.7
1.0
0.6
0.5
0.5
0.4
0.3
0.2
0.1
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.2
0.3
0.1
y 0.4 0.5
0.1 0.2
0.3 0.4
0.2
0.3
0.4
0.6
0.5
0.7
0.6
0.8
0.7
0.9
0.5
0.6
0.7
0.8
0.9
0.8
0.9
Also,
1 1
3
1
P( X , Y ) =
3
2 4
4
3
4
1
4
1
2
1
3
1
.
f (x, y) dx dy =
12
152
1
12
fX,Y (x, y) dx .
FY (y) P (Y y) =
fX (x) dx =
fY (y) dy =
fX,Y (x, y) dy dx ,
fX,Y (x, y) dx dy .
By Calculus we have
dFX (x)
= fX (x)
dx
,
153
dFY (y)
= fY (y) .
dy
EXAMPLE : If
fX,Y (x, y) =
otherwise ,
fY (y) =
fX,Y (x, y) dx =
0
FX (x) = P (X x) =
FY (y) = P (Y y) =
1
3
) = FX ( 13 ) =
1 dy = 1 ,
0
1
1 dx = 1 ,
For example
P( X
1
3
,
154
fX (x) dx = x ,
0
y
fY (y) dy = y .
P( Y
1
2
) = FY ( 21 ) =
1
2
EXERCISE :
(1 ex )(1 ey )
Let FX,Y (x, y) =
0
Verify that
F
fX,Y (x, y) =
=
xy
exy
0
for x 0 and y 0 ,
otherwise .
for x 0 and y 0 ,
otherwise .
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
1
1
3
4
3
4
3
4
EXERCISE : ( continued )
for x, y 0 .
F (0, 0) = 0
RR
0
, F (, ) = 1 ,
fX,Y (x, y) dx dy = 1 ,
fX (x) =
exy dy = ex ,
fY (y) =
exy dx = ey .
156
( So ? )
EXERCISE : ( continued )
FX,Y (x, y) = (1ex )(1ey ) ,
for x, y 0 .
Ry
ey dy = 1 ey ,
FY (y) =
fY (y) dy =
( So ? )
P( 1 < x 2 , 0 < y 1 ) =
R1R2
0
exy dx dy
= (e1 e2 )(1 e1 )
= 0.15 ,
157
X 1 (IX ) {s S : X(s) IX } ,
Y 1 (IY ) {s S : Y (s) IY } .
158
NOTE :
FX,Y (x, y) =
= ex ey = fX (x) fY (y) .
FX (x) FY (y) .
159
PROPERTY :
For independent continuous random variables X and Y we have
PROOF :
FX,Y (x, y)
for all x, y .
= P (X x , Y y)
=
=
=
Rx
Rx
Rx
= [
Rx
Ry
Ry
fX,Y (x, y) dy dx
fX (x) fY (y) dy dx
[ fX (x)
Ry
fY (y) dy ] dx
fX (x) dx ] [
= FX (x) FY (y) .
(by independence)
Ry
fY (y) dy ]
REMARK : Note how the proof parallels that for the discrete case !
160
Conditional distributions
Let X and Y be continuous random variables.
For given allowable sets IX and IY (typically intervals), let
Ex = X 1 (IX )
and
Ey = Y 1 (IY ) ,
= fX (x) ,
0
otherwise ,
have (by previous exercise) the marginal density functions
fX (x) = ex
fY (y) = ey ,
fX,Y (x, y)
fY (y)
exy
ey
= ex = fX (x) ,
Expectation
The expected value of a continuous random variable X is
Z
E[X] =
x fX (x) dx ,
163
EXAMPLE :
For the pointer experiment
fX (x) =
we have
E[X] =
0,
1,
0,
x fX (x) dx =
x0
0<x1
1<x
x dx =
and
2
E[X ] =
x fX (x) dx =
164
x dx =
x2 1
=
2 0
x3 1
=
3 0
1
,
2
1
.
3
0
otherwise .
0
otherwise .
0
otherwise ,
Thus
E[X] =
Similarly
x ex dx = [(x+1)ex ] = 1 . ( Check ! )
0
0
Z
E[Y ] =
y ey dy = 1 ,
0
and
E[XY ] =
xy exy dy dx = 1 .
165
( Check ! )
EXERCISE :
Prove the following for continuous random variables :
E[aX] = a E[X] ,
E[aX + b] = a E[X] + b ,
E[X] E[Y ] .
PROOF :
E[XY ]
=
=
=
=
=
R R
R
R R
R
x y fX,Y (x, y) dy dx
x y fX (x) fY (y) dy dx
[ x fX (x)
(by independence)
y fY (y) dy ] dx
R
R
[ R x fX (x) dx ] [ R y fY (y) dy ]
E[X] E[Y ] .
REMARK : Note how the proof parallels that for the discrete case !
167
EXAMPLE : For
fX,Y (x, y) =
we already found
xy
e
fX (x) = ex
otherwise ,
,
fY (y) = ey ,
so that
fX,Y (x, y) = fX (x) fY (y) ,
i.e., X and Y are independent .
Variance
Let
= E[X]
x fX (x) dx
= E[X 2 2X + 2 ]
= E[X 2 ] 2E[X] + 2
= E[X 2 ] 2 .
= =
=
R
0
R
0
ex ,
0,
x ex dx = 1
x2 ex dx =
x>0,
x0,
( already done ! ) ,
[(x2 + 2x + 2)ex ]
= 2,
0
= E[X 2 ] 2 = 2 12 = 1 ,
p
=
V ar(X) = 1 .
x 1
0,
c , 1 < x 1
f (x) =
0,
x>1
Determine the value of c
Draw the graph of f (x)
Determine the distribution function F (x)
Draw the graph of F (x)
Determine E[X]
Compute V ar(X) and (X)
Determine P (X 12 )
Determine P (| X | 12 )
171
x + 1 , 1 < x 0
1x , 0<x1
f (x) =
0,
otherwise
Draw the graph of f (x)
R
Verify that f (x) dx = 1
3
4
(1 x2 ) , 1 < x 1
0,
otherwise
fn (x) =
n
cx (1 xn ) , 0 x 1
0,
otherwise
(n+1)(2n+1)
n
Determine E[X] .
Determine E[X 2 ]
Covariance
Let X and Y be continuous random variables with mean
E[X] = X
E[Y ] = Y .
(x X ) (y Y ) fX,Y (x, y) dy dx .
= E[ (X X ) (Y Y ) ]
= E[XY X Y Y X + X Y ]
= E[XY ] E[X] E[Y ] .
175
EXAMPLE : For
fX,Y (x, y) =
we already found
xy
for x > 0 and y > 0 ,
e
fX (x) = ex
otherwise ,
fY (y) = ey ,
so that
fX,Y (x, y) = fX (x) fY (y) ,
i.e., X and Y are independent .
EXERCISE :
Verify the following properties :
V ar(cX + d) = c2 V ar(X) ,
Cov(X, Y ) = Cov(Y, X) ,
Cov(cX, Y ) = c Cov(X, Y ) ,
Cov(X, cY ) = c Cov(X, Y ) ,
EXERCISE :
For the random variables X , Y with joint density function
f (x, y) =
45xy 2 (1 x)(1 y 2 ) , 0 x 1 , 0 y 1
0,
Verify that
R1R1
0
otherwise
f (x, y) dy dx = 1 .
Markovs inequality.
For a continuous nonnegative random variable X , and c > 0 ,
we have
E[X]
P (X c)
.
c
PROOF :
Z
Z c
Z
E[X] =
xf (x) dx =
xf (x) dx +
xf (x) dx
0
EXERCISE :
xf (x) dx
f (x) dx
( Why ? )
= c P (X c) .
0
otherwise ,
we have
Z
E[X] =
x ex dx = 1
( already done ! )
0
c = 10 :
P (X 1)
P (X 10)
E[X]
1
E[X]
10
1
1
= 1 (!)
1
10
= 0.1
P (X 1)
P (X 10)
E[X]
1
E[X]
10
1
1
=
=
= 1 (!)
1
10
= 0.1
10
ex dx = e1
= 0.37
ex dx = e10
= 0.000045
1
,
2
k
p
where = E[X] is the mean, = V ar(X) the standard deviation.
PROOF : Let Y (X )2 , which is nonnegative.
By Markovs inequality
P (Y c)
Taking c = k2 2 we have
E[Y ]
.
c
P ( | X | k ) = P ( (X )2 k2 2 ) = P ( Y k2 2 )
V ar(X)
E[ Y ]
=
k2 2
k2 2
2
= 2 2
k
1
. QED !
2
k
and + 2 ?
EXERCISE :
The score of students taking an examination is a random variable
with mean = 65 and standard deviation = 5 .
What is the probability a student scores between 55 and 75 ?
How many students would have to take the examination so that
the probability that their average grade is between 60 and 70
is at least 80% ?
HINT : Defining
1
X = E[X] =
n = = 65 ,
n
and, assuming independence,
2
2
5
25
=
, and X = .
V ar(X) = n 2 =
n
n
n
n
186
ba , a < x b
0, xa
xa
, a<xb
f (x) =
, F (x) =
ba
0 , otherwise
1, x>b
F(x)
f(x)
1
___
ba
F(x1)
F(x2)
x
x
a
x1
x2
x1
x2
EXERCISE :
Show that the uniform density function
f (x) =
has mean
1
ba
, a<xb
0,
otherwise
a+b
,
2
ba
.
2 3
188
(x a)(y c)
F (x, y) =
,
(b a)(d c)
0.9
1.5
0.8
0.7
1.0
0.6
0.5
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.1
y 0.4 0.5
0.1 0.2
0.3 0.4
0.2
0.3
0.4
0.6
0.5
0.7
0.6
0.8
0.7
0.9
0.8
0.9
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.5
0.6
0.7
0.8
0.9
EXERCISE :
Consider the joint uniform density function
for x2 + y 2 4 ,
c
f (x, y) =
0
otherwise .
What is P (X < 0) ?
What is f ( x | y = 1 ) ?
, x>0
e
1 ex , x > 0
f (x) =
, F (x) =
0,
x0
0,
x0
with
R
1
x
( Check ! ) ,
E[X]
=
=
x
e
dx
=
0
2
E[X ]
R
0
V ar(X) = E[X 2 ] 2
(X)
2
2
x2 ex dx =
p
=
V ar(X)
1
2
=
=
( Check ! ) ,
2.0
1.0
0.8
1.5
F (x)
f (x)
0.6
1.0
0.4
0.5
0.2
0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
0.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
F (x) = 1 ex ,
for = 0.25, 0.50, 0.75, 1.00 (blue), 1.25, 1.50, 1.75, 2.00 (red ).
192
4.0
PROPERTY : From
F (x)
we have
P (X x)
P (X > x)
1 ex ,
1 (1 ex ) =
( for x > 0 ) ,
ex .
= ex .
P ( X > x + x | X > x )
EXAMPLE :
Let the density function f (t) model failure of a device,
f (t)
et ,
(taking = 1 ) ,
with
F (0) = 0 ,
and
F () = 1 ,
194
F (t) = 1 et .
EXAMPLE : ( continued )
1 F (t)
et .
1 F (t + 1)
e(t+1) .
P (Et+1 Et )
P (Et )
P (Et+1 )
P (Et )
e(t+1)
et
which is independent of t !
QUESTION : Is such an exponential distribution realistic if
the device is your heart, and time t is measured in decades ?
195
1
,
e
1
12 x2
,
e
2
< x < ,
with mean
Z
x f (x) dx = 0 ,
( Check ! )
Since
E[X 2 ] =
x2 f (x) dx = 1 ,
( more difficult )
we have
V ar(X) = E[X 2 ] 2 = 1 ,
196
and
(X) = 1 .
f (x)
1
12 x2
e
2
0.40
0.35
0.30
f (x)
0.25
0.20
0.15
0.10
0.05
0.00
4
197
(x) = F (x) =
12 x2
dx
1.0
0.9
0.8
0.7
F (x)
0.6
0.5
0.4
0.3
0.2
0.1
0.0
4
(z)
.5000
.4602
.4207
.3821
.3446
.3085
.2743
.2420
.2119
.1841
.1587
z
-1.2
-1.4
-1.6
-1.8
-2.0
-2.2
-2.4
-2.6
-2.8
-3.0
-3.2
(z)
.1151
.0808
.0548
.0359
.0228
.0139
.0082
.0047
.0026
.0013
.0007
EXERCISE :
Suppose the random variable X has the standard normal distribution.
What are the values of
P ( X 0.5 )
P ( X 0.5 )
P ( | X | 0.5 )
P ( | X | 0.5 )
P( 1 X 1 )
P ( 1 X 0.5 )
200
1
21 (x)2 / 2
e
2
E[(X )2 ]
( Why ? )
201
2 ,
1
21 (x)2 / 2
e
2
consider
E[X ] =
(x ) f (x) dx
1
=
2
(x ) e
12 (x)2 / 2
1
2
2
2
=
e 2 (x) /
E[X] = .
203
dx
0.
2
Let y (x )/ ,
so that
+c
12 (x)2 / 2
x = + y .
(x )2 / 2 = y 2
X
P(
c) =
dx = dy ,
c
12 y 2
dy = (c) .
204
dx .
and
P ( X 0.5 )
P ( X 0.5 )
P ( | X | 0.5 )
P ( | X | 0.5 )
205
Suppose
Then
= n
ar(2n )
= 2n
(2n )
2n .
NOTE :
The
1.0
0.4
0.8
0.3
0.6
f (x)
F (x)
0.5
0.2
0.4
0.1
0.2
0.0
10
0.0
10
207
If n = 1 then
21
X12 ,
X X1
where
is standard normal .
12 x2 (12t)
dx
Let
1
1 2t = 2 ,
or equivalently ,
Then
E[e
t21
] =
2
2
12 x2 /
.
1 2t
dx =
=
208
1
.
1 2t
(t)
E[e
t21
,
1 2t
(0)
1,
( Check ! )
E[(21 )2 ]
(0)
3,
( Check ! )
V ar(21 )
E[(21 )2 ] E[21 ]2
209
2.
We found that
E[21 ] = 1
V ar(21 ) = 2 .
(2n )
=
210
2n .
0.16
0.14
0.12
f (x)
0.10
0.08
0.06
0.04
0.02
0.00
10
15
20
25
The 2n - Table
n
5
6
7
8
9
10
11
12
13
14
15
= 0.975 = 0.95
0.83
1.15
1.24
1.64
1.69
2.17
2.18
2.73
2.70
3.33
3.25
3.94
3.82
4.58
4.40
5.23
5.01
5.89
5.63
6.57
6.26
7.26
= 0.05 = 0.025
11.07
12.83
12.59
14.45
14.07
16.01
15.51
17.54
16.92
19.02
18.31
20.48
19.68
21.92
21.03
23.34
22.36
24.74
23.69
26.12
25.00
27.49
1 + X
2 + + X
n ,
X
where
i = X 2 ,
X
i
and
Xi is standard normal, i = 1, 2, , n ,
212
RECALL :
If X1 , X2 , , Xn are independent, identically distributed ,
each having
, variance 2
mean
, standard deviation ,
then
S
has
mean :
X1 + X2 + + Xn ,
S
E[S] = n
variance :
V ar(S) = n 2
Standard deviation :
S =
( Why ? )
( Why ? )
mean
, standard deviation .
X1 + X2 + + Xn ,
2
n)
is approximately normal .
NOTE : Thus
S n
2
n has mean n and standard deviation 2n .
The Table below illustrates the accuracy of the approximation
r
0n
n
2
).
P ( n 0 ) = (
) = (
2
2n
pn
pn
n 2 ( 2 )
2
1
0.1587
8
2
0.0228
18
3
0.0013
QUESTION : What is the exact value of P (2n 0) ?
215
(!)
EXERCISE :
Use the approximation
P(
2n
x)
P ( 232 24 )
P ( 232 40 )
P ( | 232 32 | 8 )
216
xn
(
),
2n
RECALL :
If X1 , X2 , , Xn are independent, identically distributed ,
each having
mean , variance 2 , standard deviation ,
then
X
has
1
(X1 + X2 + + Xn ) ,
n
mean :
=
= E[X]
variance :
2
X
Standard deviation :
1
n2
n 2 = 2 /n
= / n
( Why ? )
( Why ? )
, variance 2
, standard deviation ,
(X1 + X2 + + Xn ) ,
X
n
2
NOTE : Thus
X
is approximately standard normal .
/ n
218
X
(X1 + X2 + + Xn ) ,
n
with
1
mean = 0 , standard deviation = ,
3n
is approximately normal , so that
x) = ( ) = ( 3n x ) .
P (X
1/ 3n
219
x0
( )
1/ 3n
3n x ) .
P (X 1) = ( 3n )
1 ) ? ! )
( What is the exact value of P ( X
0.1 ) .
For n = 12 find the approximate value of P ( X
0.5 ) .
For n = 12 find the approximate value of P ( X
220
EXPERIMENT : ( continued )
Letting xi = 2
xi 1 gives uniform random values in [1, 1] .
222
EXPERIMENT : ( continued )
-0.737
0.869
-0.985
0.054
-0.905
0.965
-0.127
0.795
-0.012
0.828
-0.749
0.777
0.683
0.143
0.781
0.511
-0.233
-0.233
-0.816
0.472
0.445
0.533
0.818
-0.468
0.059
-0.968
-0.534
-0.461
0.605
0.250
-0.083
0.039
-0.866
0.308
-0.344
0.507
-0.045
-0.879
-0.819
-0.071
0.377
-0.387
-0.169
-0.934
0.684
0.066
0.662
-0.165
-0.168
0.265
0.303
-0.524
0.809
0.896
0.882
0.736
-0.298
0.075
0.069
-0.680
-0.562
-0.931
0.374
0.402
0.513
-0.855
-0.450
0.009
-0.853
-0.900
0.259
0.027
-0.064
-0.003
-0.574
-0.906
-0.893
0.178
0.821
0.982
0.263
-0.281
0.033
0.001
0.523
0.472
0.182
-0.426
0.911
0.429
0.358
0.059
0.861
0.524
-0.269
0.769
-0.667
-0.362
-0.232
0.540
0.451
0.692
-0.643
0.497
-0.739
0.359
0.342
0.692
-0.475
-0.506
-0.455
-0.027
0.973
-0.446
0.656
0.999
-0.176
-0.693
0.109
-0.818
EXPERIMENT : ( continued )
Divide [1, 1] into M subintervals of equal size x =
2
M
mk
N x
Then 1 f (x) dx
=
R1
k=1
f (xk ) x =
PM
mk
k=1 N x
x = 1 ,
X
f (xk ) x .
f (x) dx
F (x ) =
=
1
k=1
224
EXPERIMENT : ( continued )
Interval
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Frequency
50013
50033
50104
49894
50242
49483
50016
50241
50261
49818
49814
50224
49971
49873
50013
Sum
50013
100046
150150
200044
250286
299769
349785
400026
450287
500105
549919
600143
650114
699987
750000
f (x)
0.500
0.500
0.501
0.499
0.502
0.495
0.500
0.502
0.503
0.498
0.498
0.502
0.500
0.499
0.500
F (x)
0.067
0.133
0.200
0.267
0.334
0.400
0.466
0.533
0.600
0.667
0.733
0.800
0.867
0.933
1.000
EXPERIMENT : ( continued )
1.0
0.8
f (x)
0.6
0.4
0.2
0.0
1.0
0.5
0.0
0.5
1.0
x
k
The approximate density function , f (xk ) = Nmx
for
N = 5, 000, 000 random numbers, and M = 200 intervals.
226
EXPERIMENT : ( continued )
1.0
1.0
0.9
0.8
0.8
0.7
0.6
f (x)
F (x)
0.6
0.5
0.4
0.4
0.3
0.2
0.2
0.1
0.0
1.0
0.5
0.0
0.5
0.0
1.00
1.0
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
( Why ? )
EXPERIMENT : ( continued )
Next ( still for the uniform random variable in [1, 1] ) :
Generate n random numbers ( n relatively small ) .
Take the average of the n random numbers.
Do the above N times, where (as before) N is very large .
Thus we deal with a random variable
1
(X1 + X2 + + Xn ) .
n
228
EXPERIMENT : ( continued )
-0.047
0.198
-0.197
-0.164
0.118
-0.005
-0.011
-0.011
0.103
0.108
0.012
-0.147
-0.236
-0.110
-0.030
0.126
0.073
-0.026
0.265
0.088
-0.011
-0.058
-0.093
-0.045
0.141
0.002
0.004
-0.217
-0.094
-0.146
-0.037
-0.025
-0.062
-0.274
-0.071
-0.018
0.198
0.117
-0.131
-0.009
-0.015
0.084
0.061
-0.115
-0.155
0.148
-0.070
-0.004
0.188
0.067
-0.048
-0.002
-0.156
-0.100
0.140
0.106
0.047
0.092
0.052
0.089
-0.130
-0.018
-0.083
-0.067
-0.134
-0.153
0.138
-0.246
0.072
0.025
0.030
-0.048
-0.003
0.135
-0.177
-0.004
-0.031
-0.031
0.049
-0.100
0.016
-0.044
0.071
0.034
-0.149
-0.096
0.018
0.005
-0.076
0.027
-0.174
0.063
-0.102
-0.090
0.132
0.086
-0.094
0.166
0.176
0.121
-0.024
-0.183
-0.054
-0.018
-0.025
0.191
-0.064
-0.033
0.002
0.242
-0.179
0.078
0.142
0.108
-0.120
-0.111
0.069
0.025
-0.121
0.020
EXPERIMENT : ( continued )
For sample size n, (n = 1, 2, 5, 10, 25) , and M = 200 intervals :
mk
N x
.
Now fn (xk ) approximates the density function of X
230
EXPERIMENT : ( continued )
Interval
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Frequency
0
0
0
11
1283
29982
181209
325314
181273
29620
1294
14
0
0
0
Sum
0
0
0
11
1294
31276
212485
537799
719072
748692
749986
750000
750000
750000
750000
f (x)
0.00000
0.00000
0.00000
0.00011
0.01283
0.29982
1.81209
3.25314
1.81273
0.29620
0.01294
0.00014
0.00000
0.00000
0.00000
F (x)
0.00000
0.00000
0.00000
0.00001
0.00173
0.04170
0.28331
0.71707
0.95876
0.99826
0.99998
1.00000
1.00000
1.00000
1.00000
EXPERIMENT : ( continued )
1.0
3.5
0.9
3.0
0.8
2.5
0.7
0.6
F (x)
f (x)
2.0
1.5
0.5
0.4
0.3
1.0
0.2
0.5
0.1
0.0
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
0.0
1.00
0.75
0.50
0.25
0.00
0.25
0.50
232
, M = 200 intervals )
0.75
1.0
EXPERIMENT : ( continued )
(X1 + X2 + + Xn ) ,
X
n
is approximately normal , with
mean = 0
1
standard deviation = .
3n
x
x0
n (
=
3n
x
,
f
x) =
x =
=
1
3n
fn (x)
.
3n
0.40
0.35
0.30
fm(x)
0.25
0.20
0.15
0.10
0.05
0.00
4
x
The normalized density functions fn (x) , for n = 1, 2, 5, 10, 25 .
, M = 200 intervals )
( N = 5, 000, 000 values of X
234
EXERCISE : Suppose
X1 , X2 , , X12 ,
(n = 12) ,
1
2
standard deviation
1
(X1 + X2 + + X12 ) .
12
P (X
1
)
3
P (X
2
)
3
P (| X
1
2
2 3
1
)
3
235
EXERCISE : Suppose
X1 , X2 , , X9 ,
(n = 9) ,
where = 1 .
=1,
1
(X1 + X2 + + X9 ) .
9
0.4)
P (X
1.6)
P (X
1 | 0.6)
P (| X
236
=1.
EXERCISE : Suppose
X1 , X2 , , Xn ,
are identical, independent, normal random variables, with
mean = 7
Let
standard deviation 4 .
1
(X1 + X2 + + Xn ) .
n
| 1) 90 % .
P (| X
237
0.08
0.25
0.07
0.06
p(x)
p(x)
0.20
0.15
0.10
0.05
0.04
0.03
0.02
0.05
0.01
0.00
10
0.00
10
20
30
40
50
60
70
80
90
100
Binomial : n = 10 , p = 0.3
EXAMPLE : ( continued )
We already know that if X is binomial then
p
np(1 p) .
(X) = np
and
(X) =
Thus, for n = 100 , p = 0.3 , we have
(X) = 30
and
(X) =
21
= 4.58 .
P (X 26) = (
) = (0.87)
= 19.2 % .
4.58
The exact binomial value is
26
X
n
P (X 26) =
pk (1 p)nk = 22.4 % ,
k
k=0
EXAMPLE : ( continued )
We found the exact binomial value
P (X 26) = 22.4 % ,
and the CLT approximation
P (X 26)
26 30
) = (0.87)
(
4.58
19.2 % .
It is better to
spread P (X = 26) over the interval [25.5 , 26.5] .
( Why ? )
26.5 30
) = (0.764)
(
4.58
22.2 % .
EXERCISE :
Consider the Binomial distribution with n = 676 and p =
0.08
0.07
0.06
p(x)
0.05
0.04
0.03
0.02
0.01
0.00
10
20
30
40
50
1
),
26
1
26
EXERCISE : (continued )
(Binomial : n = 676 , p =
1
26
EXPERIMENT :
Compare the accuracy of the Poisson and the adjusted Normal
approximations to the Binomial, for different values of n .
k
n
2
4
4
8
8 16
16 32
32 64
64 128
Binomial
0.6875
0.6367
0.5982
0.5700
0.5497
0.5352
Poisson
0.6767
0.6288
0.5925
0.5660
0.5468
0.5332
Normal
0.6915
0.6382
0.5987
0.5702
0.5497
0.5352
Any conclusions ?
243
EXPERIMENT : ( continued )
Compare the accuracy of the Poisson and the adjusted Normal
approximations to the Binomial, for different values of n .
k
n
0
4
0
8
1 16
3 32
6 64
12 128
25 256
51 512
Binomial
0.6561
0.4305
0.5147
0.6003
0.5390
0.4805
0.5028
0.5254
Poisson
0.6703
0.4493
0.5249
0.6025
0.5423
0.4853
0.5053
0.5260
Normal
0.5662
0.3618
0.4668
0.5702
0.5166
0.4648
0.4917
0.5176
Any conclusions ?
244
EXPERIMENT : ( continued )
n Binomial Poisson
4
0.9606
0.9608
8
0.9227
0.9231
16
0.8515
0.8521
32
0.7250
0.7261
64
0.5256
0.5273
128
0.6334
0.6339
256
0.5278
0.5285
512
0.5948
0.5949
1024
0.5529
0.5530
2048
0.5163
0.5165
4096
0.4814
0.4817
k) , where k = np, with
Any conclusions ?
245
Normal
0.9896
0.9322
0.8035
0.6254
0.4302
0.5775
0.4850
0.5670
0.5325
0.5018
0.4712
p = 0.01 .
SAMPLE STATISTICS
Sampling can consist of
DEFINITIONS :
247
EXAMPLES :
The most important statistics are
The sample mean
1
(X1 + X2 + + Xn ) .
n
n
1 X
2.
(Xk X)
n k=1
( to be discussed in detail )
S2 .
(if n is odd),
The sample range : the difference between the largest and the
smallest observation.
249
0.124 .
Sample variance :
1
2 + (0.511 X)
2 + (0.083 X)
2
{ (0.737 X)
8
2 + (0.562 X)
2 + (0.906 X)
2
+ (0.066 X)
2 + (0.359 X)
2 } = 0.26 .
+ (0.358 X)
Sample standard deviation :
0.26 = 0.51 .
250
EXAMPLE : ( continued )
For the 8 observations
0.737 , 0.511 , 0.083 , 0.066 , 0.562 , 0.906 , 0.358 , 0.359 ,
we also have
The order statistic :
0.906 , 0.737 , 0.562 , 0.083 , 0.066 , 0.358 , 0.359 , 0.511 .
The sample median :
1
(X1 + X2 + + Xn ) ,
n
E[X] = E[ (X1 + X2 + + Xn ) ] = ,
n
and variance
2
X
=
V ar(X)
:
Standard deviation of X
2
,
n
X
= .
n
How well does the sample mean approximate the population mean ?
It follows that
X
|
| z
|
= P |X
/ n
= P
z
z
z
, X
+ ]
[X
n
n
= 1 2 (z) ,
254
We found :
z
z
P ( [X , X + ] )
= 1 2 (z) .
n
n
= 4.5 .
The sample mean is X
Taking z=2 , we have
32
32
P ( [ 4.5
, 4.5 + ] ) = P ( [ 3.3 , 5.7 ] )
25
25
= 95 % .
= 1 2 (2)
We call [ 3.3 , 5.7 ] the 95 % confidence interval estimate of .
255
EXERCISE :
= 4.5 .
As in the preceding example, is unknown, = 3 , X
Use the formula
z
z
P ( [X
, X+
])
= 1 2 (z) ,
n
n
to determine
The 50 % confidence interval estimate of when n = 25 .
The 50 % confidence interval estimate of when n = 100 .
The 95 % confidence interval estimate of when n = 100 .
n
1 X
2 =
(Xk X)
n k=1
n
X
k=1
1
2
].
[ (Xk X)
n
[ (Xk )2 p(Xk ) ] .
k
n
1 X
2,
(Xk X)
n k=1
[ (Xk )2 p(Xk ) ] ,
k
Nevertheless, we will show that for large n their values are close !
=
258
2 .
n
1 X
Xk
n k=1
n
X
implies
.
Xk = n X
k=1
FACT 2 : From
2
V ar(X)
E[(X )2 ]
E[X 2 ] 2 ,
we (obviously) have
E[X 2 ]
2 + 2 .
2
X
259
)2 ] =
E[(X
.
n
PROOF :
S2
n
1 X
2
(Xk X)
n k=1
n
1 X 2
2 .
[
Xk ] X
n k=1
n
X
1
2
=
(Xk X)
n k=1
n
1 X 2
+X
2)
(Xk 2Xk X
=
n k=1
n
n
X
1 X 2
2]
=
[
Xk 2 X
Xk + n X
n k=1
k=1
n
1 X 2
2 + nX
2]
[
Xk 2nX
=
n k=1
260
QED !
2
1 2
2
2
2
= + ( + ) = (1 ) . ( Fact 3 )
n
n
REMARK : Thus limn E[S 2 ] = 2 .
261
QED !
n
1 X
2.
(Xk X)
n 1 k=1
n
1 X
2
(Xk X)
n 1 k=1
E[S2 ]
2 .
1
2 = 0.335 ,
(Xk X)
120 k=1
S 2 = 0.579 ,
S =
S2 =
while
= 0,
Z 1
1
2
2 1
=
(x )
dx =
,
2
3
1
1
2
=
=
= 0.577 .
3
What do you say ?
263
EXAMPLE :
Generate 50 uniform random numbers in [1, 1] .
Compute their average.
Do the above 500 times.
k , k = 1, 2, , 500 .
Call the results X
k is the average of 50 random numbers.
Thus each X
and S of these 500 values.
Compute the sample statistics X
and S ?
Can you predict the values of X
264
EXAMPLE : ( continued )
Results :
500
X
1
=
k =
X
X
500 k=1
S2
0.00136 ,
500
1 X
2 = 0.00664 ,
(Xk X)
=
500 k=1
S =
S 2 = 0.08152 .
EXERCISE :
?
What is the value of E[X]
to E[X]
.
Compare X
?
What is the value of V ar(X)
.
Compare S 2 to V ar(X)
265
n
1 X
2
(Xk X)
= 2 .
n k=1
k=1
PROOF :
LHS =
2
2
2
X
2 }
{
X
2X
X
+
2X
X
k
k
k
k
k=1
Pn
+ n2 + 2nX
2 nX
2
= 2nX
2 2nX
+ n2
= nX
266
RHS .
QED !
Rewrite Fact 5
n
X
(Xk )2
k=1
as
n
X
Xk 2
k=1
and then as
and
k=1
2
(Xk X)
)2 ,
= n(X
n
X
2
n 1 X
2
(X
X)
=
,
k
2
n k=1
/ n
n
X
k=1
where
n
X
Zk2
n 2
2
S
=
Z
,
2
n
1 .
2
267
( Why ? )
n
1 .
2
n1 distribution !
2
n 1 2
2
S
has
the
n1 distribution .
2
268
n 1 2
2
S
has
the
= P (215
n 1
15
2
2
= P
S
129
2
1002
24.96)
= 5 % ( from the 2 Table ) .
0.16
0.14
0.12
f (x)
0.10
0.08
0.06
0.04
0.02
0.00
10
15
20
25
EXERCISE :
In the preceding example, also compute
P ( 215 24.96 )
using the standard normal approximation .
EXERCISE :
Consider the same shipment of light bulbs :
The lifetime of the bulbs has a normal distribution .
The mean lifetime is not given.
The standard deviation is claimed to be = 100 hours.
Suppose we test the lifetime of only 6 bulbs .
For what value of s is P (S s) = 5 % ?
271
and
S2
n
1 X
2 = 0.02278 .
=
(Xk X)
n 1 k=1
272
= 0.00575 , S2 = 0.02278 .
SOLUTION : We have n = 16 , X
Estimate the population standard deviation :
ANSWER : = S =
0.02278 = 0.15095 .
Compute a 95 percent confidence interval for :
ANSWER : From the Chi-Square Table :
P (215 6.26) = 0.025
(n 1)
2
(n 1)
2
S2
S2
15 0.02278
(n 1)S2
=
= 0.05458
= 6.26 =
6.26
6.26
2
15 0.02278
(n
1)
S
2
=
= 0.01223
= 27.49 =
27.49
27.49
2
EXAMPLE :
Suppose a bag contains three balls, numbered 1, 2, and 3.
A sample of two balls is drawn at random from the bag.
Recall that ( here with n = 2 ) :
S2
1
(X1 + X2 + + Xn ) .
n
n
1 X
2.
(Xk X)
n k=1
E[X]
and
275
E[S 2 ] .
1
9
are
The sample means X
3
3
5
5
1 ,
, 2 ,
, 2 ,
, 2 ,
, 3,
2
2
2
2
with
3
3
5
5
1
(1 + + 2 + + 2 + + 2 + + 3) = 2 .
E[X] =
9
2
2
2
2
The sample variances S 2 are
1
1
1
1
, 1 ,
, 0 ,
, 1 ,
, 0.
( Check ! )
0 ,
4
4
4
4
with
1
1
1
1
1
1
2
E[S ] =
(0 + + 1 + + 0 + + 1 + + 0) =
.
9
4
4
4
4
3
276
E[X] =
(
6 2
1
6
are
X
3
5
5
2 ,
,
, 2 ,
,
2
2
2
3
5
5
+ 2 +
+
+ 2 + ) = 2.
2
2
2
1
.
4
( Check ! )
1
1
+ 1 + ) =
4
4
1
.
2
EXAMPLE : ( continued )
A bag contains three balls, numbered 1, 2, and 3.
A sample of two balls is drawn at random from the bag.
and E[S 2 ] :
We have computed E[X]
= 2,
E[X]
E[S 2 ] =
1
3
= 2,
Without replacement : E[X]
E[S 2 ] =
1
2
With replacement :
1
1
1
= 2,
= 1 + 2 + 3
3
3
3
2 1
2 1
2 1
= (1 2) + (2 2) + (3 2)
3
3
3
278
2
.
3
EXAMPLE : ( continued )
We have computed :
Population statistics :
= 2 ,
2 =
2
3
= 2 ,
E[X]
E[S 2 ] =
1
3
= 2 ,
Sampling without replacement : E[X]
E[S 2 ] =
1
2
(1
1 2
) .
n
E[S ]
1 2
(1 )
2
1
.
3
QUESTION :
Why is E[S 2 ] wrong for sampling without replacement ?
ANSWER : Without replacement the outcomes Xk of a sample
are not independent !
X1 , X2 , , Xn ,
1
.
P (X2 = 1 | X1 = 2) =
2
( Why not ? )
NOTE :
Let N be the population size and n the sample size .
Suppose N is very large compared to n .
For example, n = 2 , and the population is
{ 1 , 2 , 3 , , N } .
Then we still have
P (X2 = 1 | X1 = 1) = 0 ,
but for k 6= 1 we have
P (X2 = k | X1 = 1) =
1
.
N 1
( Why ? )
is defined as
RX,Y
i Y )
X)(Y
qP
qP
;
N
N
2
2
(X
X)
i
i=1
i=1 (Yi Y )
PN
i=1 (Xi
1.
X)
i
i=1
i=1 (Yi Y )
can also be used to test for linearity of the data.
In fact,
If | RX,Y | = 1 then X and Y are related linearly .
Specifically,
If RX,Y = 1 then Yi = cXi + d, for constants c, d, with c > 0 .
If RX,Y = 1 then Yi = cXi + d, for constants c, d, with c < 0 .
Also,
If | RX,Y |
= 1 then X and Y are almost linear .
284
EXAMPLE :
Consider the average daily high temperature in Montreal in March.
The Table shows these averages, taken over a number of years :
1
2
3
4
5
6
7
-1.52
-1.55
-1.72
-0.94
-0.51
-0.29
0.02
8 -0.52 15
9 -0.67 16
10 0.01 17
11 0.96 18
12 0.49 19
13 1.26 20
14 1.99 21
2.08
1.22
1.73
1.93
3.10
3.05
3.32
22
23
24
25
26
27
28
3.39 29
3.69 30
4.45 31
4.74
5.01
4.66
6.45
6.95
6.83
6.93
EXERCISE :
The Table below shows class attendance and course grade/100.
The attendance was sampled in 18 sessions.
11
16
8
14
11
14
16
9
47
85
62
82
43
69
61
84
13
13
13
17
17
15
4
15
43
82
71
66
66
85
46
91
15
16
12
16
18
17
18
15
70
67
56
91
57
79
70
77
17
17
15
17
18
18
0
16
72
91
81
67
74
84
29
75
18
16
16
7
13
17
17
96
71
69
43
73
70
82
14
16
18
15
15
15
18
61
50
93
86
74
55
82
5
14
18
18
18
14
16
25
77
77
85
73
75
82
17
12
17
17
17
15
14
74
68
48
84
71
61
68
1
12 x2 / 2
e
.
2
EXAMPLE : ( continued )
We know we can estimate 2 by the sample variance
S2
n
X
1
2.
(Xk X)
n k=1
E[S ]
1 2
(1 ) .
n
289
X1 , X2 , , Xn ,
be
each having
density function f (x ; ) ,
1
2 2
Pn
k=1
x2k
( 2 )n
( Why ? )
( Why equivalent ? )
EXAMPLE : ( continued )
We had
n
d 1 X 2
2
xk n log
= 0.
d
2 k=1
2
x
k=1 k
3
from which
2 =
0,
n
1 X 2
xk .
n k=1
=
Xk
.
( Surprise ? )
n k=1
292
EXERCISE :
Suppose a random variable has the general normal density function
1
21 (x)2 / 2
f (x ; , ) =
e
,
2
with unknown mean and unknown standard deviation .
Derive maximum likelihood estimators for both and as follows :
For the joint density function
f (x1 , x2 , , xn ; , ) = f (x1 ; , ) f (x2 ; , ) f (xn ; , ) ,
EXERCISE : ( continued )
The maximum likelihood estimators turn out to be
that is,
n
1 X
Xk ,
n k=1
n
12
1 X
2
=
(Xk X)
,
n k=1
= X
= S
NOTE :
S2
E[S ]
1X
2.
(Xk X)
n k=1
1 2
(1 )
n
2.
EXERCISE :
A random variable has the standard exponential distribution
with density function
f (x ; ) =
x
, x>0
e
0,
x0
296
0,
x0
1.50
1.0
0.9
1.25
0.8
0.7
1.00
F (x)
f (x)
0.6
0.75
0.5
0.4
0.50
0.3
0.2
0.25
0.1
0.00
0.0
EXAMPLE : ( continued )
For the maximum likelihood estimator of , we have
f (x ; ) = 2 x ex ,
for x > 0 ,
+xn )
EXAMPLE : ( continued )
We had
d
2n log +
d
Differentiating gives
n
X
k=1
2n
from which
log xk
n
X
xk
n
X
k=1
xk
= 0.
0,
k=1
2n
Pn
k=1
xk
P
.
=
=
n
X
k=1 Xk
( Why ? )
EXERCISE :
For the special exponential density function in the preceding example,
2 x
, x>0
xe
f (x ; ) =
0,
x0
Verify that
f (x ; ) dx = 1 .
Also compute
E[X] =
x f (x ; ) dx
NOTE :
Maximum likelihood estimates also work in the discrete case .
In such case we maximize the probability mass function .
EXAMPLE :
Find the maximum likelihood estimator of p in the Bernoulli trial
P (X = 1)
p,
P (X = 0)
1p .
( x = 0, 1 )
(!)
Pn
k=1 xk
(1 p)
301
(1 p)
Pn
k=1
xk
EXAMPLE : ( continued )
We found
P (x1 , x2 , , xn ; p) = p
Pn
k=1 xk
(1 p)
(1 p)
Pn
k=1
xk
0.
EXAMPLE : ( continued )
We found
n
n
X
X
1
1
n
+
xk
xk
p k=1
1p
1 p k=1
0,
from which
n
1
1 X
+
xk
p
1 p k=1
Multiplying by 1 p gives
n
X
1 p
+1
xk
p
k=1
n
.
1p
n
X
1
xk
p k=1
n,
EXERCISE :
Consider the Binomial probability mass function
N
P (x ; p) P (X = x) =
px (1 p)N x ,
x
where x is an integer, (0 x N ) .
What is the joint probability mass function P (x1 , x2 , , xn ; p) ?
( Be sure to distinguish between N and n ! )
Determine the maximum likelihood estimator p of p .
( Can you guess what p will be ? )
304
Hypothesis Testing
EXAMPLE :
We consider ordering a large shipment of 50 watt light bulbs.
The manufacturer claims that :
The lifetime of the bulbs has a normal distribution .
The mean lifetime is = 1000 hours.
The standard deviation is = 100 hours.
We want to test the hypothesis that = 1000 .
We assume that :
The lifetime of the bulbs has indeed a normal distribution .
The standard deviation is indeed = 100 hours.
We test the lifetime of a sample of 25 bulbs .
306
Density function of X ,
(n = 25) ,
Density function of X
also indicating X ,
also indicating X X ,
(X = 1000 , X = 100) .
(X = 1000 , X = 20 ).
307
EXAMPLE : ( continued )
We test a sample of 25 light bulbs :
= 960 hours.
We find the sample average lifetime is X
Do we accept the hypothesis that = 1000 hours ?
Using the standard normal Table we have the one-sided probability
960 1000
960) =
P (X
= (2.0) = 2.28 % ,
100/ 25
EXAMPLE : ( continued )
We test a sample of 25 light bulbs :
= 1040 hours.
Suppose instead the sample average lifetime is X
Do we accept that = 1000 hours ?
Using the standard normal Table we have one-sided probability
1040 1000
1040) = 1
P (X
= 1 (2) = (2) = 2.28 % ,
100/ 25
Would you accept the hypothesis that that the mean is 1000 hours ?
Would you accept the shipment ?
309
(!)
EXAMPLE : ( continued )
Suppose that we accept the hypothesis that = 1000 if
1040 .
960 X
= 12(2)
P ( | X1000
| 40 ) = 1 2
= 95 % ,
100/ 25
and we reject the hypothesis with probability
1000 | 40 ) = 100 % 95 % = 5 % .
P( | X
310
EXAMPLE : ( continued )
What is the probability of
acceptance of the hypothesis if is different from 1000 ?
)
) (
P (960 X 1040) = (
100/ 25
100/ 25
= (3) (1) = 1 (3) (1)
(1 0.0013) 0.1587 = 84 % .
) (
)
P (960 X 1040) = (
100/ 25
100/ 25
= (0) (4)
= 50 % .
312
= X = 980
= X = 1000
= X = 1040
P (accept) = 84%
P (accept) = 95%
P (accept) = 50%
: n = 25 , X = 20
Density functions of X
QUESTION 1 : How does P (accept) change when we slide the
along the X-axis
density function of X
, i.e., when changes ?
QUESTION 2 : What is the effect of increasing the sample size n ?
313
EXAMPLE :
Now suppose there are two lots of light bulbs :
314
RECALL :
x | = 1 1000) .
Type 1 error : P (X
x | = 2 1100) .
Type 2 error : P (X
(1 , 1 ) = (1000, 100)
(2 , 2 ) = (1100, 100) .
NOTE :
At x the value of
max { P (Type 1 Error) , P (Type 2 Error) }
is minimized .
321
0.020
0.0040
0.0035
0.015
0.0030
fX (x)
f (x)
0.0025
0.0020
0.010
0.0015
0.005
0.0010
0.0005
0.0000
400
600
800
1000
1200
1400
1600
1800
0.000
400
600
800
1000
1200
1400
1600
1800
(n = 25) .
The density functions of X
(1 , 1 ) = (1000 , 100)
(blue)
(2 , 2 ) = (1100 , 200)
(red)
322
P ( Type 2 Error ) ,
x | = 1 ) = P (X
x | = 2 ) ,
P (X
x
x
1
2 ,
=
1 / n
2 / n
1 x
x 2
, ( by monotonicity of ) .
=
1 / n
2 / n
from which
1 2 + 2 1
.
( Check ! )
x
=
1 + 2
With 1 = 1000 , 1 = 100 , 2 = 1100 , 2 = 200 , we have
1033 .
where (1 < 2 ) ,
1 2 + 2 1
.
1 + 2
325
EXERCISE :
Determine the optimal decision criterion x that minimizes
max { P (Type 1 Error) , P (Type 2 Error) } ,
when
(1 , 1 ) = (1000, 200) ,
(2 , 2 ) = (1100, 300) .
n = 25 ,
326
n = 100 .
= 4.88 ,
|
|X
| 4.88 5.00 |
= 0.12 .
( level of significance 10 % )
= 0.2 ,
X
Z
/ n
= 4.88,
X
= 5.0,
| X
|= 0.12 .
is standard normal ,
P( | X
0.2/ 9
= P ( | Z | 1.8 ) = 2 (1.8)
= 7.18 % .
Thus we reject the hypothesis that = 5.00 significance level 10 % .
NOTE : We would accept H0 if the level of significance were 5 % .
( We are more tolerant when the level of significance is smaller. )
328
= 4.847 ,
sample mean X
sample standard deviation S = 0.234 .
if and only if
Z
/ n
4.847 ,
4.847 4.8
0.234/8
= 1.6 .
= 4.88 ,
sample mean X
sample standard deviation S = 0.234 .
4.8 ,
4.88) < 5 % .
P (X
NOTE :
If n 30 then the approximation
= S is not so accurate.
In this case better use the student t-distribution Tn1 .
331
= 0.1
-1.476
-1.440
-1.415
-1.397
-1.383
-1.372
-1.363
-1.356
-1.350
-1.345
-1.341
= 0.05 =0.01
-2.015
-3.365
-1.943
-3.143
-1.895
-2.998
-1.860
-2.896
-1.833
-2.821
-1.812
-2.764
-1.796
-2.718
-1.782
-2.681
-1.771
-2.650
-1.761
-2.624
-1.753
-2.602
= 0.005
-4.032
-3.707
-3.499
-3.355
-3.250
-3.169
-3.106
-3.055
-3.012
-2.977
-2.947
S/ n
4.88 4.8
0.234/4
= 1.37 .
P ( T15 1.341 ) = 10 % ,
P ( T15 1.753 ) =
P ( T15 1.753 ) =
5%.
n1 distribution .
2
S 2.58
if and only if
S
2.58
= 24.96 ,
2
4
and from the 2 Table
P ( 2 25.0 )
= 5.0 % .
15
EXERCISE :
( Probably Yes ! )
334
-1.52
-1.55
-1.72
-0.94
-0.51
-0.29
0.02
8 -0.52 15
9 -0.67 16
10 0.01 17
11 0.96 18
12 0.49 19
13 1.26 20
14 1.99 21
2.08
1.22
1.73
1.93
3.10
3.05
3.32
22
23
24
25
26
27
28
3.39 29
3.69 30
4.45 31
4.74
5.01
4.66
6.45
6.95
6.83
6.93
Suppose that :
We believe that these temperatures basically increase linearly .
In fact we found the sample correlation coefficient Rxy = 0.98 .
Thus we believe in a relation
Tk
c1 + c2 k ,
k = 1, 2, , 31 .
2
2
N
X
k=1
N
X
k=1
( Tk (c1 + c2 xk ) ) = 0 ,
xk ( Tk (c1 + c2 xk ) ) = 0 .
339
N
X
k=1
k=1
xk ( Tk (c1 + c2 xk ) ) = 0 .
PN
PN
xk Tk x k=1 Tk
,
PN 2
2
k=1 xk N x
k=1
c1 = T c2 x ,
where
x =
N
X
1
xk
N k=1
T =
N
X
1
Tk .
N k=1
c2 = 0.289 .
n
X
ck k (x) ,
k=1
N
X
i=1
(p(xi ) yi )2
EXAMPLES :
p(x) = c1 + c2 x .
p(x) = c1 + c2 x + c3 x2 .
(Already done !)
343
(Quadratic approximation)
Then
EL
xT x
N
X
i=1
N
X
i=1 ci i (x1 )
(T denotes transpose).
i=1
[ p(xi ) yi ]2 =
Pn
x2k .
p(x1 )
y1
2
k
k
p(xN )
yN
y1
k
= k
Pn
yN
i=1 ci i (xN )
1 (x1 ) n (x1 )
y1
c1
2
= k
cn
1 (xN ) n (xN )
yN
344
k Ac y k2 .
THEOREM :
For the least squares error EL to be minimized we must have
AT A c = AT y .
PROOF :
EL = k Ac y k2
= (Ac y)T (Ac y)
= (Ac)T Ac (Ac)T y yT Ac + yT y
= cT AT Ac cT AT y yT Ac + yT y .
345
PROOF : ( continued )
We had
EL = cT AT Ac cT AT y yT Ac + yT y .
i .e.,
EL
= 0,
ci
i = 0, 1, , n ,
which gives
cT AT A + (AT Ac)T (AT y)T yT A = 0 ,
( Check ! )
i.e.,
or
2cT AT A 2yT A = 0 ,
c T AT A = y T A .
Transposing gives
AT Ac = AT y .
346
QED !
0
1
1
1 1 1 1
3
c1
,
=
2 c2
0 1 2 4 2
4
3
9
7
c1
=
,
19
c2
21
SOLUTION : Here
N =4 ,
Use
1
0
0
or
n=3 ,
1 (x) = 1 ,
the Theorem
:
1 0
1 1 1
1 1
1 2 4
1 2
1 4 16
1 4
4 7
7 21
21 73
2 (x) = x ,
3 (x) = x2 .
0
1
c1
1 1 1 1
1
c2 = 0 1 2 4 3 ,
2
4
c3
0 1 4 16
16
3
9
21
c1
73 c2 = 19 ,
59
c3
273
3.5
3.5
3.0
3.0
2.5
2.5
2.0
2.0
1.5
1.5
1.0
1.0
0.5
0.5
p(x) = c1 + c2 x
p(x) = c1 + c2 x + c3 x2
349
Source :
-5
-3
3
11
19
24
26
25
20
13
6
-2
http : //weather.uk.msn.com
350
30
25
Temperature
20
15
10
10
10
12
Month
EXAMPLE : ( continued )
The graph suggests using a 3-term least squares approximation
p(x)
x
x
+ c2 sin( ) + c3 cos( ) .
6
6
of the form
p(x)
c1
QUESTIONS :
x
6
c2 = 8.66 ,
352
c3 = 12.8 .
30
25
Temperature
20
15
10
10
10
Month
12
EXAMPLE :
Consider the following experimental data :
0.6
0.5
0.4
0.3
0.2
0.1
0.0
354
EXAMPLE : ( continued )
c1 xc2 ec3 x .
355
EXAMPLE : ( continued )
The functional relationship has the form
y
c1 xc2 ec3 x .
NOTE :
What to do ?
356
EXAMPLE : ( continued )
Fortunately, in this example we can take the logarithm :
log y = log ( c1 xc2 ec3 x ) = log c1 + c2 log x c3 x .
This gives a linear relationship
log y
where
and
1 (x) = 1 ,
2 (x) = log x ,
3 (x) = x .
Thus
EXAMPLE : ( continued )
0.0
0.5
1.0
log y
1.5
2.0
2.5
3.0
3.5
4.0
4.5
358
EXAMPLE : ( continued )
We had
y
c1 xc2 ec3 x ,
and
log y
with
1 (x) = 1 ,
2 (x) = log x ,
and
3 (x) = x ,
c1 = log c1 .
c2 = 2.04 ,
and
c1 = ec1
= 0.995 .
359
c3 = 1.01 ,
EXAMPLE : ( continued )
0
log y
360
EXAMPLE : ( continued )
0.6
0.5
0.4
0.3
0.2
0.1
0.0
361
k = 1, 2, ,
k = 1, 2, .
Thus
If
If
0 < 1 then xk 0 as k
>1
(extinction) .
k = 1, 2, ,
is given, (0 4) .
x0 is given, (0 x0 1) .
(Prove this !)
EXERCISE :
xk+1 = xk (1 xk ) ,
k = 1, 2, ,
Compute xk for k = 1, 2, 50 .
365
1.0
0.5
0.9
0.8
0.4
0.7
p(x)
0.6
0.3
0.5
0.4
0.2
0.3
0.2
0.1
0.1
0.0
0.0
0.2
0.4
0.6
0.0
0.0
0.8
0.1
0.2
, x0 = 0.77
0.4
0.5
0.6
0.7
0.8
0.9
0.3
Graphical interpretation.
, 50 iterations
366
, 200 subintervals .
1.0
0.9
1.0
0.8
0.9
0.8
0.7
0.7
0.6
p(x)
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.0
0.0
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0
0.0
0.1
0.2
, x0 = 0.77
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Graphical interpretation.
, 50 iterations
367
, 200 subintervals .
1.0
1.0
0.06
0.9
0.05
0.8
0.7
0.04
p(x)
0.6
0.03
0.5
0.4
0.02
0.3
0.2
0.01
0.1
0.00
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0
0.0
0.1
0.2
, x0 = 0.77
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Graphical interpretation.
, 50 iterations
368
, 200 subintervals .
1.0
EXERCISE :
xk+1 = xk (1 xk ) ,
k = 1, 2, ,
369
1.0
1.0
0.9
0.8
0.8
0.7
0.6
p(x)
0.6
0.5
0.4
0.4
0.3
0.2
0.2
0.1
0.0
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.0
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.1
Graphical interpretation.
, 200 subintervals .
1.0
1.0
0.25
0.9
0.8
0.20
0.7
0.6
p(x)
0.15
0.5
0.4
0.10
0.3
0.2
0.05
0.1
0.00
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.0
0.1
0.2
0.3
0.5
0.6
0.7
0.8
0.9
0.4
Graphical interpretation.
, 200 subintervals .
1.0
1.0
0.12
0.9
0.8
0.10
0.7
0.6
p(x)
0.08
0.5
0.06
0.4
0.04
0.3
0.2
0.02
0.1
0.00
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.0
0.1
0.2
0.3
0.5
0.6
0.7
0.8
0.9
0.4
Graphical interpretation.
, 200 subintervals .
1.0
1.0
0.08
0.9
0.07
0.8
0.06
0.7
0.05
p(x)
0.6
0.04
0.5
0.4
0.03
0.3
0.02
0.2
0.01
0.1
0.00
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Graphical interpretation.
, 200 subintervals .
1.0
1.0
0.035
0.9
0.030
0.8
0.025
0.7
0.6
p(x)
0.020
0.5
0.015
0.4
0.3
0.010
0.2
0.005
0.1
0.000
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Graphical interpretation.
, 200 subintervals .
1.0
1.0
0.040
0.9
0.035
0.8
0.030
0.7
0.025
p(x)
0.6
0.020
0.5
0.4
0.015
0.3
0.010
0.2
0.005
0.1
0.000
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Graphical interpretation.
, 200 subintervals .
1.0
1.0
0.035
0.9
0.030
0.8
0.025
0.7
0.6
p(x)
0.020
0.5
0.015
0.4
0.3
0.010
0.2
0.005
0.1
0.000
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Graphical interpretation.
, 200 subintervals .
1.0
CONCLUSIONS :
Nature is complex !
377
k = 1, 2, 3, .
a uniform distribution,
a normal distribution,
QUESTION :
k = 1, 2, 3, .
{ 0, 1, 2, , p 1 }
{ 0, 1, 2, , p 1 } ,
given by
f (x) = (n x) mod p ,
is one-to-one (and hence onto, a bijection, and invertible).
380
p prime , p 6 | n
(n x) mod p is 1 1
EXAMPLE :
p = 7
x
0
1
2
3
4
5
6
and
12x
0
12
24
36
48
60
72
n = 12 .
12x mod 7
0
5
3
1
6
4
2
Invertible !
NOTE : The values of 12x mod 7 look somewhat random !
381
p prime , p 6 | n
(n x) mod p is 1 1
EXAMPLE :
p = 6
x
0
1
2
3
4
5
and
2x
0
2
4
6
8
10
n = 2.
2x mod 6
0
2
4
0
2
4
Not Invertible .
382
p prime , p 6 | n
(n x) mod p is 1 1
EXAMPLE :
p = 6
x
0
1
2
3
4
5
Invertible .
and
13x
0
13
26
39
52
65
n = 13 .
13x mod 6
0
1
2
3
4
5
( So ? )
NOTE : The numbers in the right hand column dont look random !
383
p prime , p 6 | n
(n x) mod p is 1 1
and
(nx2 ) mod p = k ,
( Why ? )
Contradiction !
xk = (n xk1 ) mod p ,
k = 1, 2, , p 1 .
12x
0
12
24
36
48
60
72
12x mod 7
0
5
3
1
6
4
2
f (x)
sequence
cycle period
5x mod 7
1546231
6x mod 7
1616161
8x mod 7
1111111
9x mod 7
1241241
10x mod 7
1326451
11x mod 7
1421421
12x mod 7
1546231
387
k = 1, 2, , 100 .
Result :
71
96
16
70
79
30
5
85
31
22
46
75
63
61
27
55
26
38
40
74
17
87
65
95
100
84
14
36
6
1
48
8
35
90
15
53
93
66
11
86
88
82
81
64
78
13
19
20
37
23
94
83
98
50
42
7
18
3
51
59
4
68
45
58
77
97
33
56
43
24
41
91
32
39
57
60
10
69
62
44
92
49
25
21
54
9
52
76
80
47
34
73
29
89
99
67
28
72
12
2
xk
0.460
0.750
0.630
0.610
0.270
0.550
0.260
0.380
0.400
0.740
0.170
0.870
0.650
0.950
1.000
0.840
0.140
0.360
0.060
0.010
0.480
0.080
0.350
0.900
0.150
0.530
0.930
0.660
0.110
0.860
k = 1, 2, , 100 ,
xk
.
100
0.880
0.820
0.810
0.640
0.780
0.130
0.190
0.200
0.370
0.230
0.940
0.830
0.980
0.500
0.420
0.070
0.180
0.030
0.510
0.590
0.040
0.680
0.450
0.580
0.770
0.970
0.330
0.560
0.430
0.240
0.410
0.910
0.320
0.390
0.570
0.600
0.100
0.690
0.620
0.440
0.920
0.490
0.250
0.210
0.540
0.090
0.520
0.760
0.800
0.470
0.340
0.730
0.290
0.890
0.990
0.670
0.280
0.720
0.120
0.020
k = 1, 2, , 100 .
Result :
75
61
55
38
74
75
61
55
38
74
35
15
93
11
48
35
15
93
11
48
50
7
3
59
83
50
7
3
59
83
57
10
62
41
32
57
10
62
41
32
67
72
2
73
89
67
72
2
73
89
38
74
75
61
55
38
74
75
61
55
11
48
35
15
93
11
48
35
15
93
59
83
50
7
3
59
83
50
7
3
41
32
57
10
62
41
32
57
10
62
73
89
67
72
2
73
89
67
72
2
k = 1, 2, , 100 .
Result :
49
21
9
76
47
49
21
9
76
47
70
30
85
22
96
70
30
85
22
96
100
14
6
17
65
100
14
6
17
65
13
20
23
82
64
13
20
23
82
64
33
43
4
45
77
33
43
4
45
77
76
47
49
21
9
76
47
49
21
9
22
96
70
30
85
22
96
70
30
85
17
65
100
14
6
17
65
100
14
6
82
64
13
20
23
82
64
13
20
23
QUESTIONS :
45
77
33
43
4
45
77
33
43
4
392
RECALL :
Let f (x) be a density function on an interval [a, b] .
f (x) dx .
393
x1 x2
y2 y1 .
= F 1 (Y ) ,
with
x1 = F 1 (y1 )
and
x2 = F 1 (y2 ) .
Then
P (x1 X x2 ) = y2 y1 = F (x2 )F (x1 ) .
( Why ? )
EXAMPLE :
Recall that the exponential density function
x
, x>0
e
f (x) =
0,
x0
has distribution function
F (x) =
1 ex , x > 0
0,
x0
1
(y) = log(1 y) ,
396
0y<1.
( Check ! )
0.8
F (x)
0.6
0.4
0.2
0.0
0
EXAMPLE : ( continued )
(Using the inverse method to get exponential random numbers.)
Divide [0, 12] into 100 subintervals of equal size x = 0.12 .
Let Ik denote the kth interval, with midpoint xk .
Use the inverse method to get N exponential random numbers.
Let mk be the frequency count (# of random values in Ik ) .
Let
Then
f(xk ) =
Z
f(x) dx
=
100
X
mk
.
N x
f(xk ) x = 1 .
k=1
0.8
0.8
0.6
0.6
f (x)
1.0
f (x)
1.0
0.4
0.4
0.2
0.2
0.0
0.0
N = 500 .
N = 50, 000 .
f (x) =
x + 1 , 1 < x 0
1x , 0<x1
0,
otherwise
1 + 2y , 0 y 12
F 1 (y) =
1 2 2y , 12 < y 1
1.0
0.8
0.8
0.6
F (x)
f (x)
0.6
0.4
0.4
0.2
0.2
0.0
1.0
0.5
0.0
0.5
1.0
0.0
1.0
0.5
0.0
0.5
1.0
401
1.50
1.25
1.25
1.00
1.00
f (x)
f (x)
0.75
0.75
0.50
0.50
0.25
0.25
0.00
1.00
0.75
0.50
0.25
0.00
0.25
0.50
0.75
1.00
N = 500 .
0.00
1.00
0.75
0.50
0.25
0.00
0.25
N = 50, 000 .
0.50
0.75
1.00
SUMMARY TABLES
and
FORMULAS
403
Discrete
p(xi ) = P (X = xi )
Continuous
f (x)x
= P (x
R
p(xi ) = 1
ik p(xi )
F (xk ) =
E[X] =
E[g(X)] =
E[XY ] =
i,j
xi yj p(xi , yj )
Rx
f (x) dx
f (x) = F (x)
E[X] =
x f (x) dx
E[g(X)] =
g(x) f (x) dx
xi p(xi )
g(xi )p(xi )
< X < x + 2 )
f (x) dx = 1
F (x) =
E[XY ] =
404
R R
xyf (x, y) dy dx
Name
General Formula
Mean
= E[X]
Variance
Covariance
Markov
P (X c) E[X]/c
Chebyshev
P (| X | k) 1/k2
Moments
405
Name
Domain
Bernoulli
P (X = 1) = p , P (X = 0) = 1 p
0, 1
Binomial
n
P (X = k) =
pk (1 p)nk
k
0kn
Poisson
P (X = k) = e
406
k
k!
k = 0, 1, 2,
Name
Bernoulli
Binomial
Poisson
p
p(1 p)
np
p
np(1 p)
407
Name
Domain
1
ba
xa
ba
Exponential
ex
1 ex
Std. Normal
1 2
1 e 2 x
2
x (, )
1
2
2
1 e 2 (x) /
2
x (, )
Uniform
Normal
408
x (a, b]
x (0, )
Name
ba
2 3
Exponential
Standard Normal
General Normal
Uniform
Chi-Square
409
2n
(z)
.5000
.4602
.4207
.3821
.3446
.3085
.2743
.2420
.2119
.1841
.1587
z
-1.2
-1.4
-1.6
-1.8
-2.0
-2.2
-2.4
-2.6
-2.8
-3.0
-3.2
(z)
.1151
.0808
.0548
.0359
.0228
.0139
.0082
.0047
.0026
.0013
.0007
The 2n - Table
n
5
6
7
8
9
10
11
12
13
14
15
= 0.975 = 0.95
0.83
1.15
1.24
1.64
1.69
2.17
2.18
2.73
2.70
3.33
3.25
3.94
3.82
4.58
4.40
5.23
5.01
5.89
5.63
6.57
6.26
7.26
= 0.05 = 0.025
11.07
12.83
12.59
14.45
14.07
16.01
15.51
17.54
16.92
19.02
18.31
20.48
19.68
21.92
21.03
23.34
22.36
24.74
23.69
26.12
25.00
27.49
= 0.1
-1.476
-1.440
-1.415
-1.397
-1.383
-1.372
-1.363
-1.356
-1.350
-1.345
-1.341
= 0.05 =0.01
-2.015
-3.365
-1.943
-3.143
-1.895
-2.998
-1.860
-2.896
-1.833
-2.821
-1.812
-2.764
-1.796
-2.718
-1.782
-2.681
-1.771
-2.650
-1.761
-2.624
-1.753
-2.602
= 0.005
-4.032
-3.707
-3.499
-3.355
-3.250
-3.169
-3.106
-3.055
-3.012
-2.977
-2.947