Lecture Notes-March06

MSCI 431 (Stochastic Models and Methods)
Summary of Lectures, Winter 2014

by Hossein Abouee Mehrizi, University of Waterloo
Remark: These are point-form summary of the lectures for MSCI 431. There is no
guarantee for completeness and accuracy, and therefore, they should not be regarded as a
substitutable for attending course lectures. Lectures are based on the book titled Introduction to Probability Models by Sheldon M. Ross.
1. Introduction to Probability Theory:
Lecture 1:
Probability
Experiment: any process whose outcome is not known in advance.
Flip a coin
Roll a die
Sample space: set of all possible outcomes.
Example. Flipping a coin: S = {H, T }
Example. Rolling a die: S = {1, 2, ..., 6}
Example. Flipping two coins: S = {(H, H), (H, T ), (T, H), (T, T )}
Example: Rolling two dice: S = {(m, n) : 1 m, n 6}
Event: subset of the sample space.
Example. Flipping a coin: E = {H}, the event that a head appears.
Example. Rolling a die: E = {2, 4, 6}, the event that an even number appears.
S
Union of events E and F (E F ): all outcomes that are either in E, or F , or both.
S
Example. Flipping a coin: E = {H}, F = {T }. Then, E F = {H, T }.
T
Intersection of events E and F (E F ): all outcomes that are in both E and F .
T
Example. Rolling a die: E = {1, 3, 5}, F = {1, 2, 3}. Then, E F = {1, 3}.
Consider the events E1 , E2 , .... Then,
S
Union of these events,
i=1 Ei , is a new events that includes all outcomes that
are in En for one value of n = 1, 2, ....
T
Intersection of these events,
i=1 Ei , is a new events that includes all outcomes
that are in all En for n = 1, 2, ...
1
Complement of E (E c ): outcomes that are in the sample space S and are not in E.
Probability: Consider an experiment with the sample space S. For an event E S,
we assume that P (E) is defined and satisfies
(i) 0 P (E) 1,
(ii) P (S) = 1,
(iii) ForSa sequenceP
of the events E1 , E2 , ... that are mutually exclusive (En
P ( i=1 Ei ) =
i=1 P (Ei ).
Em = , n 6= m),
Example. Flipping a fair coin: P ({H}) = P ({T }) = 1/2.

Example. Rolling a die: P ({1, 3, 5}) = P ({1}) + P ({3}) + P ({5}) = 1/2.
Two basic properties of probability:
P (E c ) = 1 P (E).
P (E F ) = P (E) + P (F ) P (E F ).
Example. Flipping two coins:
E = {(H, H), (H, T )}: head appears on the first coin
E = {(H, H), (T, H)}: head appears on the second coin
P (E F ) = 1/2 + 1/2 1/4 = 3/4.
Conditional Probability
Example. Rolling two dice: Suppose we observe that 4 has appeared on the first die.
What is the probability that the sum is 6?
E: event that 4 appears on the first die
F : event that sum of two dice is 6
P (E|F ): the probability that the sum of two dice is 6 given that 4 has appeared
on the first die.
The sample space reduces to {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)} given that 4
has appeared on the first die. Therefore, P (E|F ) = 1/6.
P (E|F ) =
P (E F )
.
P (F )
Example. Suppose cards are numbered 1 to 10 and they are placed in a hat and one
of them is drawn. We are told that the number on the drawn card is at least 5. What
is the probability that the number on the drawn card is 10?
E: event that number on the drawn card is 10
F : event that number on the drawn card is at least 5
P (E F )
1/10
P (E|F ) =
=
= 1/6.
P (F )
6/10
Lecture 2:
Conditional Probability (continue ...)
Example. A family has two children. What is the probability that both are boys given
that at least one of them is a boy?
E: event that both are boys.
F : event that at least one of them is a boy.
P (E F )
1/4
P (E|F ) =
=
= 1/3.
P (F )
3/4
Example. Suppose that an urn contains 7 black balls and 5 white balls. We draw two
balls from the urn without replacement. What is the probability that both balls are
black?
E: event that the second one is black.
F : event that the first one is black.
P (E F ) = P (F )P (E|F ) = (7/12)(6/12) = 42/132.
Example. Bev can take a course in computers or chemistry. If Bev takes the computer
course, then she will receive an A grade with probability 1/2. If she takes the chemistry
course, then she will receive an A grade with probability 1/3. Bev decides to base her
decision on the flip of a fair coin. What is the probability that Bev will get an A in
chemistry?
E: event that she receives an A.
F : event that she takes chemistry.
P (E F ) = P (F )P (E|F ) = (1/2)(1/3) = 1/6.
Example. Suppose that each of three men at a party throws his hat into the center of
the room. Each man randomly selects a hat. What is the probability that none of the
three men selects his hat?
S S
T
T
Remark.
P
(E
E
E
)
=
P
(E
)+P
(E
)+P
(E
)P
(E
E
)P
(E
E3 )
1
2
3
1
2
3
1
2
1
T
T T
P (E2 E3 ) + P (E1 E2 E3 ).
Ei : event that ith man selects his own hat.
S S
P (E1 E2 E3 ): probability that at least one of them selects his own hat.
S S
1 P (E1 E2 E3 ): probability that none of them selects his own hat.
P (Ei ) = 1/3, i = 1, 2, 3
Since each man is equally likely to select any of them.
T
P (E1 E2 ) = P (E1 )P (E2 |E1 ) = (1/3)(1/2) = 1/6.
Since given that the first man has selected his own hat, there remain two hats
that the second man may select.
3
Ej ) = (1/3)(1/2) = 1/6, i 6= j.
T
T
T
P (E1 E2 E3 ) = P (E1 E2 )P (E3 |E1 E2 ) = (1/6)(1) = 1/6.
T
P (E3 |E1 E2 ) = 1: since given that the first two men get their own hats, it
follows that the third man must also get his own hat.
S S
T
T
T
P (E1 T E2 T E3 ) = P (E1 )+P (E2 )+P (E3 )P (E1 E2 )P (E1 E3 )P (E2 E3 )+
P (E1 E2 E3 ) = 1/3 + 1/3 + 1/3 1/6 1/6 1/6 + 1/6 = 2/3.
S S
1 P (E1 E2 E3 ) = 1 2/3 = 1/3.
P (Ei
Independent Events
Two events E and F are independent if P (E
F ) = P (E)P (F ).
This implies P (E|F ) = P (E) and P (F |E) = P (F ).

Example. Suppose we toss two fair dice.
Ei : event that sum of the two dice is 6.
F : event that first die is 4.
T
P (E1 F ) = P ({4, 2}) = 1/36.
P (E1 )P (F ) = (5/36)(1/6) = 5/216.
T
P (E1 F ) 6= P (E1 )P (F )!
Lecture 3:
Independent Events
Two events E and F are independent if P (E
F ) = P (E)P (F ).
This implies P (E|F ) = P (E) and P (F |E) = P (F ).

Example. Suppose we toss two fair dice.
Ei : event that sum of the two dice is 6.
F : event that first die is 4.
T
P (E1 F ) = P ({4, 2}) = 1/36.
P (E1 )P (F ) = (5/36)(1/6) = 5/216.
T
P (E1 F ) 6= P (E1 )P (F )!
Since our chance of getting a total of six depends on the outcome of the first
die.
E2 : event that sum of the dice is 7.
T
P (E1 F ) = P ({4, 3}) = 1/36.
P (E1 )P (F ) = (1/6)(1/6) = 1/36.
T
P (E1 F ) = P (E1 )P (F )!
Bayes Formula
Let E and F be events. Then,
T S T
E = (E F ) (E F c ).
Since a point is is E if it is either in both E and F , or in E and not F .
T
T
P (E) = P (E F ) + P (E F c ) = P (F )P (E|F ) + P (F c )P (E|F c ).
Example. Consider 2 urns. The first contains 2 white and 7 black balls, and the second
contains 5 white and 6 black balls. We flip a fair coin and then draw a ball from the
first urn or second urn depending on whether the outcome was heads or tails. What
is the probability that the outcome of the toss was heads given that a white ball was
selected?
W : Event that a white ball is drawn.
H: Event that the coin comes up heads.
P (H|W ): probability that the outcome of the coin is heads given that a white
ball was selected.T
P (H W )
.
P (H|W ) =
W
T
P (H W ) = P (H)P (W |H) = (1/2)(2/9).
5
P (W ) = P (H)P (W |H) + P (H c )P (W |H c ) = (1/2)(2/9) + (1/2)(5/11).

T
(1/2)(2/9)
P (H W )
=
= 22/67.
P (H|W ) =
W
(1/2)(2/9) + (1/2)(5/11)
Example. In answering a question with five-choice test, a student knows the correct
answer with probability 1/2 and guess with probability 1/2. Assume that a student
who guesses at the answer will be correct with probability 1/5. What is the probability
that a student knew the correct answer given that she answered it correctly?
C: event that the student answers it correctly
K: event that the
T student knows the answer
P (K)P (C|K)
(1/2)(1)
P (K C)
=
=
=
P (K|C) =
c
c
C
P (K)P (C|K) + P (K )P (C|K )
(1/2)(1) + (1/2)(1/5)
5/6.
2. Random Variables:
In performing an experiment we are often interested in some functions of the outcome
as opposed to the the outcome itself.
Example. Rolling two dice: we interested in sum of the two dice.
Random variables: real-valued functions defined on the sample space.
Example. Flipping two fair coins: Let Y denote the number of heads appearing.
Y : a random variable taking on one of the values 0, 1, 2.

P (Y = 0) = P ({T, T }) = 1/4.
P (Y = 1) = P ({T, H}, {H, T }) = 2/4.
P (Y = 2) = P ({H, H}) = 1/4.
P (Y = 0) + P (Y = 1) + P (Y = 2) = 1.
Example. Suppose that we flip a coin having a probability p of coming up heads

until the first head appears. Let N denote the number of flips required, and
assume that the outcome of successive flips are independent.
N : a random variable taking on one of the values 1, 2, 3, ...

P (N = 1) = P ({H}) = p.
P (N = 2) = P ({T, H}) = (1 p)p.
P (N = 3) = P ({T, T, H}) = (1 p)2 p.
P (N = n) = P ({T, T, , T, H}) = (1 p)n1 p, n 1.
Cumulative Distribution Function (CDF) or distribution function F () of the random

variable X is defined for any real number b , b , by F (b) = P (X b).
6
F (b) denotes the probability that the random variable X takes on a value less
than or equal to b.
Example. Flipping two fair coins: Let Y denote the number of heads
Y : a random variable taking on one of the values 0, 1, 2.

F (0) = P (Y 0) = P (Y = 0) = 1/4.
F (1) = P (Y 1) = P (Y = 0) + P (Y = 1) = 3/4.
F (2) = P (Y 2) = P (Y = 0) + P (Y = 1) + P (Y = 2) = 1.
Discrete Random Variables:

A random variable that can take on at most a countable number of possible values is
said to be discrete.
P (a) = P (X = a): probability mass function of X.
Example. Flipping two fair coins: Let Y denote the number of heads
probability mass function of Y : P (0) = 1/4, P (1) = 2/4, P (2) = 1/4.
CDF of Y : F (0) = 1/4, F (1) = 3/4, F (2) = 1.
Lecture 4:
Discrete Random Variables:
Discrete random variables are often classified according to their probability mass functions.
The Bernoulli Random Variable
Consider a trial, or an experiment whose outcome can be classified as either a
success or failure.
Let X equal 1 if the outcome is a success and 0 if the outcome is a failure.
Let 0 p 1 denote the probability that the trial is a success.
The probability mass function of X is
P (0) = P (X = 0) = 1 p,
P (1) = P (X = 1) = p.
X is a Bernoulli random variable with parameter p.
Example. Flipping a fair coin: consider heads as a success and tails as a failure.
P (0) = 1/2, P (1) = 1/2.
The Binomial Random Variable
Suppose that n independent trails, each of which results in a success with
probability p and in a failure with probability 1 p, are to be performed.
Let X represent the number of successes that occur in the n trials.
X is a binomial random variable with parameters (n, p).
Probability mass function of a binomial random variable having parameters (n, p):

n!
pi (1 p)ni , i = 0, 1, , n.
P (i) = ni pi (1 p)ni =
(n i)!i!
Example. Suppose that each patient in a hospital is discharged on day t with
probability p. What is the distribution of the number of discharged patients on
day t given that there are n patients in the hospital on that day?
Consider discharge of a patient as a success.

X: number of successes (discharged patients).
There are n trials and each of them is a success with probability p.
Therefore, the distribution of the number of discharged patients is binomial

n!
pi (1p)ni , i = 0, 1, , n.
with parameters (n, p): ni pi (1p)ni =
(n i)!i!
Example. It is known that any item produced by a certain machine will be
defective with probability 0.1, independently of any other item. What is the
probability that in a sample of three items, at most one item will be defective?
X: number of defective items in the sample.
8
X is a binomial random variable with parameters (3, 0.1).

Probability at most one defective in the sample: P (X = 0) + P (X = 1).

P (X = 0) + P (X = 1) = 30 (0.1)0 (1 0.1)3 + 31 (0.1)1 (1 0.1)2 = 0.972.
The Geometric Random Variable
Suppose that independent trials, each having probability p of being a success, are
performed until a success occurs.
Let X denote the number of trails required until the first success.
X is geometric random variable with parameter p.
Probability mass function of a geometric random variable having parameters p:
P (n) = (1 p)n1 p, n = 1, 2, , n.
Example. Suppose that we flip a coin having a probability p of coming up heads
until the first head appears. Let N denote the number of flips required, and
assume that the outcome of successive flips are independent.
N : a geometric random variable.
Therefore, P (N = n) = (1 p)n1 p,
n 1.
Poisson Random Variable

A random variable X taking on one of the values 0, 1, 2, , is said to be a Poisson
random variable with parameter , if for some > 0,
i
P (i) = P (X = i) = e , i = 0, 1, .
i!
Example. Pedestrian death (true story): In January 2010 there were 7 pedestrian
deaths in Toronto (14 in GTA). On average there are 2.66 pedestrian deaths per
month in Toronto.
Suppose that the distribution of the number of pedestrian deaths in Toronto is
Poisson with parameter 2.66. What is the probability of having 7 pedestrian
deaths in a month in Toronto?
(2.66)7
= 0.013077 or 1.3%.
P (= 7) = e2.66
7!
Probability of having 7 or more pedestrian deaths in a month in Toronto:
j
P
2.66 (2.66)
P (X 7) =
e
= 0.019.
j=7
j!
Poisson random variable may be used to approximate a binomial random variable
binomial parameter n should be large,
binomial parameter p should be small.
The Poisson random variable that approximates a binomial random variable
with parameters (n, p) has parameter = np.
Example. Number of fires in Toronto per day.
This is caused by a very large n number of buildings.
9
Each building has a very small probability p of having a fire.

Then, the number of fires per day can be approximated by a Poisson
random variable with parameter = np.
Suppose number of building is n = 100000 and each has a probability
2.5
of having a fire.
100000
Let N denote number of fires.
2.5k
.
P (N = k) = e2.5
k!
Continuous Random Variables:
X is a continuous random variable if there exists a nonnegative function f (x), define
for all real x R (, ) having the property that for any set B of real numbers
P (X B) = B f (x)x.
f (x): probability density function.
R
P (X (, )) = f (x)dx = 1.
Let B = [a, b]. Then, P (a X b) =
Ra
P (X = a) = a f (x) = 0.
Ra
F (a) = P (X (, a]) = f (x)dx
Rb
a
10
f (x)dx.
Lecture 5:
Several Important Continuous Random Variables:
The Uniform Random Variable
A random variable is said to be uniformly distributed over the interval (0, 1) if its
probability density function (pdf) is
(
1, 0 < x < 1
f (x) =
0, otherwise
f (x)dx =
R1
0
(1)dx = 1.
X is just as likely to be near any value in (0, 1) as any other value:

Rb
Rb
P (a X b) = a f (x)dx = a (1)dx = b a.
The probability that X is in any particular subinterval of (0, 1) equal the
length of that subinterval.
In general, X is a uniform random variable on the interval (, ) if its probability
density function (pdf) is
1 , <x<
f (x) =
0,
otherwise
Example. Calculate the cumulative distribution function (cdf) of a random variable uniformly distributed over (, ).
Ra
1 Ra
F (a) = P (X (, a]) = f (x)dx =
(1)dx. Then,

0,
a
a
, <a<
F (a) =
1,
a>
Example. If X is uniformly distributed over (0, 10), calculate the probability that
X < 3, X > 7, 1 < X < 6.
1 R3
3
P (X < 3) =
(1)dx = .
0
10
10
1 R 10
3
P (X > 7) =
(1)dx = .
7
10
10
1 R6
5
P (1 < X < 6) =
(1)dx = .
1
10
10
Exponential Random Variable
11
A continuous random variable whose probability density function is given, for

some > 0, by
(
ex , x 0
f (x) =
0,
x<0
is said to be an exponential random variable with parameter .
Ra
Cumulative distribution function (cdf): F (a) = 0 ex = 1 ea , a 0.
Example. The dollar amount of damage involved in a car accident is an exponential random variable with parameter 1/1000. Of this, the insurance company
only pays that amount exceeding (the deductible amount of) 400. What is the
probability that the insurance company pays at least 400 to a person who has an
accident?
Since the company pays the amount exceeding 400, the damage should be at
least 800 in order to the company pays at least 400. Then,
P (X 800) = 1 P (X < 800) = 1 F (800) = 1 (1 e800/1000 ) =
e800/1000 = 0.4493.
Expectation of a Random Variable:
The Discrete Case
Consider a discrete random variable X with probability mass function P (x).
Then, the expected value of X is defined by
X
E[X] =
xP (x).
x:P (x)>0
The expected value of X is a weighted average of the possible values that X

can take on.
Example. Find E[X] where X is the outcome when we roll a fair die.
1
1
1
1
1
7
1
E[X] = 1( ) + 2( ) + 3( ) + 4( ) + 5( ) + 6( ) = .
6
6
6
6
6
6
2
Example. Calculate E[X] when X is a Bernoulli random variable with parameter
p.
E[X] = 0(1 p) + 1(p) = p.
Example. Calculate E[X] when X is a binomial random variable with parameters
(n, p).

P
P
P
n!
E[X] = ni=0 iP (i) = ni=0 i ni pi (1 p)ni = ni=0 i
pi (1 p)ni
(n i)!i!
= np.
Example. Calculate E[X] when X is a geometric random variable with parameter
p.
P
P
P
1
i1
i1
E[X] =
p=p
= .
i=1 iP (i) =
i=1 i(1 p)
i=1 i(1 p)
p
12
Example. Calculate E[X] when X is a Poisson random variable with parameter

.
P
P ! P i
= i=1 e
E[X] =
iP
(i)
=
i=0
i=0 ie
i!
(i 1)!
P i1
= i=1 e
= .
(i 1)!
13
Lecture 6:
Discrete Case (continue ...)
Example. Suppose that teams A and B are playing a series of games. Team A
wins each game independently with probability 2/3 and Team B wins each game
independently with probability 1/3. The winner of the series is the first team to
win 2 games. Find the expected number of games that are played.
X: number of games
P (X = 2) = P (X = 2, A wins 2 of the first 2)+P (X = 2, B wins 1 of the first 2)
2 2
1
5
2
+
=
=
3
3
9
P (X = 3) = P (X = 3, A wins 1 of the first 2)+P (X = 3, B wins 1 of the first 2)

2 1 1 1 2
1 1 2 1 1
12
2
2
= 1
+ 1
= .
3
3
3
3
3
3
27
66
E [X] = 2P (X = 2) + 3P (X = 3) = .
27
Continuous Case
Consider a continuous random variable X with probability density function f (x).
Then, the expected value of X is defined by
Z
xf (x)dx.
E[X] =
Example. Calculate E[X] when X is a random variable uniformly distributed

over (, ).
R
1
2 2
( )( + )
( + )
E[X] = x(
)dx =
=
=
.
2( )
2( )
2
Example. Calculate E[X] when X is an exponential random variable with parameter .
R
E[X] = 0 x(ex )dx.
Integrating by parts (dv = ex , u = x) yields to
Z
Z
1
1
x
x
E[X] =
x(e )dx = xe |0 +
(ex )dx = 0 (ex )|
.
0 =
0
0
Expectation of a Function of a Random Variable:
Suppose we are interested in a function of X, say g(X).
If X is a discrete random variable with probability mass function P (x), then for any
real-valued function g(x),
X
E [g(X)] =
g(x)P (x).
x:P (x)>0
14
Example. Suppose X has the following probability mass function P (0) = 0.2, P (1) =
0.5, P (2) = 0.3. Calculate E[X 2 ].
E[X 2 ] = (0)2 (0.2) + (1)2 (0.5) + (2)2 (0.3) = 1.7.
If X is a continuous random variable with probability density function f (x), then for
any real-valued function g(x),
Z
E [g(X)] =
g(x)f (x)dx.
Example. The dollar amount of damage involved in a car accident is an exponential random variable with expected value of 1000. The insurance company pays
the whole damage if it is more than 400 and 0 otherwise. What is the expected
value that the company pays per accident?
Let define g(X) as
(
0, 0 < X < 400
g(X) =
X, 400 < X < 1000
Then,
Z
E [g(X)] =
400
g(x)f (x)dx =

(0)

Z
1 x/1000
1 x/1000
e
e
dx+
(x)
dx
1000
1000
400
1 1/1000x
e
, u = x) yields to
1000
Z
1/1000x
E[g(X)] = xe
|400 +
(e1/1000x )dx = 400e400/1000 +1000(e400/1000 ).
Integrating by parts (dv =
400
Remark. If a and b are constants, then

E[aX + b] = aE[X] + b.
Remark. V ar(X) = E [(X E(X))2 ]: the variance of X measures the expected
square of deviation of X from its expected value.
V ar(X) = E [X 2 ] (E[X])2 .
Example. Calculate V ar(X) when X is the outcome of rolling a fair die.
1
1
1
1
1
1
91
E [X 2 ] = 12 ( ) + 22 ( ) + 32 ( + 42 ( ) + 52 ( ) + 62 ( ) = ( )
6
6
6
6
6
6
6
7
E[X] = (It is obtained in Lecture 5.)
2
91
7
35
V ar(X) = E [X 2 ] (E[X])2 =
( )2 = .
6
2
12
15
Lecture 7:
Expectation of a Function of a Random Variable...
Remark. If a and b are constants, then
E[aX + b] = aE[X] + b.
Remark. V ar(X) = E [(X E(X))2 ]: the variance of X measures the expected square
of deviation of X from its expected value.
V ar(X) = E [X 2 ] (E[X])2 .
Example. Calculate V ar(X) when X is the outcome of rolling a fair die.
1
1
1
1
1
1
91
E [X 2 ] = 12 ( ) + 22 ( ) + 32 ( + 42 ( ) + 52 ( ) + 62 ( ) = ( )
6
6
6
6
6
6
6
7
E[X] = (It is obtained in Lecture 5.)
2
91
7
35
V ar(X) = E [X 2 ] (E[X])2 =
( )2 = .
6
2
12
Joint Distribution Functions:
If X and Y are discrete random variables, the joint probability mass function of X
and Y is defined by
P (x, y) = P (X = x, Y = y).
The probability mass function of X can be obtained from P (x, y) by
X
PX (x) =
P (x, y).
y:P (x,y)>0
The probability mass function of Y can be obtained from P (x, y) by

X
PY (y) =
P (x, y).
x:P (x,y)>0
Example. Suppose X and Y are discrete random variables with the probability
mass function P (x, y),
P (1, 1) = 1/4, P (1, 2) = 1/8, P (1, 3) = 1/16, P (1, 4) = 1/16,
P (2, 1) = 1/16, P (2, 2) = 1/16, P (2, 3) = 1/4, P (2, 4) = 1/8.
What is the probability that Y = 3?
PY (3) = P (Y = 3, X = 1) + P (Y = 3, X = 2) =
1
1
5
+ = .
16 4
16
Remark. For discrete random variables X and Y , and real-valued function g(X, Y )
XX
E[g(X, Y )] =
g(x, y)P (x, y).
y
16
Remark. If X1 , X2 , , Xn are n independent random variables, then for any n constants a1 , a2 , , an ,

E[a1 X1 + a2 X2 + + an Xn ] = a1 E[X1 ] + a2 E[X2 ] + + an E[Xn ].
Example. Calculate the expected sum obtained when three fair dice are rolled.
Let X denote the sum obtained.
Let Xi denote the value of the ith die.
X = X1 + X2 + X3 . Thus,
7
7
21
7
E[X] = E[X = X1 +X2 +X3 ] = E[X1 ]+E[X2 ]+E[X3 ] = ( )+( )+( ) = .
2
2
2
2
Example. Suppose that there are 20 patients of type A and 15 patients of type
B in a hospital. Each patient of type A is discharged today with probability 2/3
independent of other patients. Also, each patient of type B is discharged today
with probability 1/3 independent of other patients. What is the expected number
of patients who are discharged today?
Let X denote the total number of discharged patients.
(
1, if the ith patient of type A is discharged today
Xi =
0, otherwise
(
1, if the ith patient of type B is discharged today
Yi =
0, otherwise
2
2
1
E[Xi ] = (0)( ) + (1)( ) = .
3
3
3
2
1
1
E[Yi ] = (0)( ) + (1)( ) = .
3
3
3
X = X1 + X2 + + X20 + Y1 + Y2 + + Y15 .
Thus,
E[X] = E[X1 + X2 + + X20 + Y1 + Y2 + + Y15 ]
= E[X1 ] + E[X2 ] + + E[X20 ] + E[Y1 ] + E[Y2 ] + + E[Y15 ]
2
1
55
= (20)( ) + + (15)( ) = .
3
3
3
Independent Random Variables:
The random variable X and Y are independent if for all a and b,
P (X a, Y b) = P (X a)P (Y b).
If X and Y are independent, then for any functions g(X) and h(X)
E[g(X)h(X)] = E[g(X)]E[h(X)].
17
Example. Two fair dice are rolled. What is the expected value of the product of
their outcomes?
Let Z denote the product of their outcomes.
Let X denote the outcome of the first die.
Let Y denote the outcome of the second die.
49
7 7
E[Z] = E[XY ] = E[X]E[Y ] = ( )( ) = .
2 2
4
18
Lecture 8:
Conditional Probability: Discrete Case
Recall that for any two events E and F , the conditional probability of E given F is
defined, as long as P (F ) > 0, by
T
P (E F )
P (E|F ) =
.
P (F )
If X and Y are discrete random variables, the conditional probability mass function of
X given that Y = y is defined by,
P (X = x|Y = y) =
P (x, y)
P (X = x, Y = y)
=
.
P (Y = y)
P (y)
Example. Suppose that P (x, y), the joint probability mass function of X and Y ,
is given by
P (1, 1) = 0.5, P (1, 2) = 0.1, P (2, 1) = 0.1, P (2, 2) = 0.3
Calculate the conditional probability mass function of X given that Y = 1.
P (Y = 1) = P (1, 1) + P (2, 1) = 0.6
P (X = 1|Y = 1) =
P (1, 1)
5
P (X = 1, Y = 1)
=
= .
P (Y = 1)
P (Y = 1)
6
P (X = 2|Y = 1) =
P (X = 2, Y = 1)
P (2, 1)
1
=
= .
P (Y = 1)
P (Y = 1)
6
The conditional cumulative distribution function of X given Y = y is defined, for all

y such that P (Y = y) > 0, by
X
F (x|y) = P (X x|Y = y) =
P (a|y).
ax
The conditional expectation of X given that Y = y is defined by

X
X
E[X|Y = y] =
xP (X = x|Y = y) =
xP (x|y).
x
Example. The joint probability mass function of X and Y , P (x, y), is given by
1
1
1
P (1, 1) = , P (2, 1) = , P (3, 1) = ,
9
3
9
1
1
P (1, 2) = , P (2, 2) = 0, P (3, 2) = ,
9
18
1
1
P (1, 3) = 0, P (2, 3) = , P (3, 3) = .
6
9
19
Calculate E[X|Y = 2].
E[X|Y = 2] = (1)P (X = 1|Y = 2) + (2)P (X = 2|Y = 2) + (3)P (X = 3|Y = 2)

= (1)
P (X = 2, Y = 2)
P (X = 3, Y = 2)
P (X = 1, Y = 2)
+ (2)
+ (3)
P (Y = 2)
P (Y = 2)
P (Y = 2)
1
1
0
5
= (1) 9 + (2) + (3) 18 = .
1
1
1
3
6
6
6
Remark. If X is independent of Y , then P (X = x|Y = y) = P (X = x).

Example. If X1 and X2 are independent binomial random variables with respective parameters (5, 0.4) and (10, 0.4), calculate the conditional probability mass
function of X1 given that X1 + X2 = 8.
Remark. If X1 and X2 are binomial random variables with parameters (n1 , p)
and (n2 , p), respectively, then X1 + X2 is a binomial random variable with
parameters (n1 + n2 , p).
P (X1 = k|X1 +X2 = 8) =
P (X1 = k)P (X2 = 8 k)

P (X1 = k, X1 + X2 = 8)
=
P (X1 + X2 = 8)
P (X1 + X2 = 8)
5
10

k
5k
8k
108+k
(0.4)
(1
0.4)
(0.4)
(1
0.4)
8k

= k
=
15
8 (1 0.4)158
(0.4)
8
5
k
10
8k

15
8

,
0 k 5.
Example. There are n components. On a rainy day, component i will function with
probability pi . On a nonrainy day, component i will function with probability qi ,
for i = 1, n. It will rain tomorrow with probability . Calculate the conditional
expected number of components that function tomorrow given it rains.
Let define Xi and Y as
(
1, if component i functions tomorrow
Xi =
0, otherwise
(
1, if it rains tomorrow
Y =
0, otherwise
Then,
"
E
n
X
#
Xi |Y = 1 =
i=1
n
X
E [Xi |Y = 1] =
i=1
The last equality is because of E [Xi |Y = 1] = pi .

20
n
X
i=1
pi .
Conditional Probability: Continuous Case

X and Y are jointly continuous if there exists a function f (x, y), defined for all real x
and y, having the property that for all sets A and B of real numbers
Z Z
P (X A, Y B) =
f (x, y)dxdy.
B
The function f (x, y) is called the joint probability density function of X and Y .
The probability density function of Y can be obtained from f (x, y) by
Z
Z Z
fY (y)dy,
f (x, y)dxdy =
P (Y B) = P (X (, ), Y B) =
B
where fY (y) =
f (x, y)dx.
If X and Y have a joint density function f (x, y), then the conditional probability
density function of X, given that Y = y, is defined for all values of y such that fY (y),
by
f (x, y)
f (x|y) =
fY (y)
21
Lecture 10:
Conditional Probability: Continuous Case ...
X and Y are jointly continuous if there exists a function f (x, y), defined for all real x
and y, having the property that for all sets A and B of real numbers
Z Z
P (X A, Y B) =
f (x, y)dxdy.
B
The function f (x, y) is called the joint probability density function of X and Y .
The probability density function of Y can be obtained from f (x, y) by
Z Z
Z
P (Y B) = P (X (, ), Y B) =
f (x, y)dxdy =
fY (y)dy,
B
where fY (y) =
f (x, y)dx.
Example. Suppose the joint probability density of X and Y is given by

(
6xy(2 x y), 0 < x < 1, 0 < y < 1
f (x, y) =
0,
otherwise
Calculate fY (y).
6xy(2 x y)dx = y(4 3y).
fY (y) =
0
For continuous random variables X and Y , and real-valued function g(X, Y )

Z Z
E[g(X, Y )] =
g(x, y)f (x, y)dxdy.

(
6xy(2 x y), 0 < x < 1, 0 < y < 1
f (x, y) =
0,
otherwise
Calculate E[g(X, Y )] where g(X, Y ) = Y .
E[g(X, Y )] =
Z
y6xy(2xy)dxdy =
Z
=
0

(6x2 y 2 2x3 y 2 3x2 y 3 )|10 dy

3
7
4
6y 2 2y 2 3y 3 dy = ( y 3 y 4 )|10 = .
3
4
12
22
If X and Y have a joint density function f (x, y), then the conditional probability
density function of X, given that Y = y, is defined for all values of y such that fY (y),
by
f (x, y)
.
f (x|y) =
fY (y)
(
6xy(2 x y), 0 < x < 1, 0 < y < 1
f (x, y) =
0,
otherwise
Calculate the probability density function of X given that Y = y.
f (x|y) =
6xy(2 x y)
f (x, y)
6xy(2 x y)
6x(2 x y)
=
= R1
=
.
fY (y)
y(4 3y)
(4 3y)
6xy(2 x y)dx
0
The conditional expectation of X, given that Y = y, is defined for all values of y by

Z
xf (x|y)dx.
E[X|Y = y] =

(
6xy(2 x y), 0 < x < 1, 0 < y < 1
f (x, y) =
0,
otherwise
Calculate the conditional expectation of X given that Y = y.
E[X|Y = y] =
xf (x|y)dx =
(2 y)2
6x2 (2 x y)
dx =
(4 3y)
4 3y
6
4
5 4y
.
8 6y
Computing Expectations by Conditioning

Suppose X and Y are two random variables. If Y is a discrete random variable, then
the expected value of X can be obtained by
X
E[X] =
E[X|Y = y]P (Y = y).
y
If Y is a continuous random variable, then the expected value of X can be obtained

by
Z
E[X] =
E[X|Y = y]fY (y)dy.
23
Example. Sam will read either one chapter of his probability book or one chapter
of his history book. Suppose the number of misprints in a chapter of his probability book is Poisson distributed with mean 2 and the number of misprints in his
history chapter is Poisson distributed with mean 5. Assume that Sam is equally
likely to choose either book. What is the expected number of misprints that Sam
will come across?
X: the number of misprints.
(
1, if Sam chooses his history book
Y =
2, if Sam chooses his probability book
Then,
1
7
1
E[X] = E[X|Y = 1]P (Y = 1) + E[X|Y = 2]P (Y = 2) = 5( ) + 2( ) = .
2
2
2
Example. A miner is trapped in a mine containing three doors. The first door
leads to a tunnel that takes him to safety after two hours of travel. The second
door leads to a tunnel that returns him to the mine after three hours of travel.
The third door leads to a tunnel that returns him to his mine after five hours.
Assuming that the miner is at all times equally likely to choose any one of the
doors, what is the expected length of time until the miner reaches safety?
X: the time until the miner reaches safety.
Y : the door he initially chooses.
Then,
E[X] = E[X|Y = 1]P (Y = 1)+E[X|Y = 2]P (Y = 2)+E[X|Y = 3]P (Y = 3).
E[X|Y = 1] = 2.
E[X|Y = 2] = 3 + E[X].
E[X|Y = 3] = 5 + E[X].
Therefore,
E[X] = E[X|Y = 1]P (Y = 1)+E[X|Y = 2]P (Y = 2)+E[X|Y = 3]P (Y = 3)
1
1
1
E[X] = (2) + (3 + E[X]) + (5 + E[X]) E[X] = 10.
3
3
3
Computing Probabilities by Conditioning

Let E denote an arbitrary event and define X as
(
1, if E occurs
X=
0, otherwise
Then,
E[X] = (1)P (E) + (0)(1 P (E)) = P (E).
Therefore, P (E) = E[X].
24
Lecture 11:
Computing Probabilities by Conditioning ...
Let E denote an arbitrary event and Y denote a discrete random variable. Then, the
probability of event E can be obtained by
X
P (E) =
P (E|Y = y)P (Y = y).
y
If Y is a continuous random variable, then the probability of event E can be obtained

by
Z
P (E) =
P (E|Y = y)fY (y)dy.
Example. Data indicate that the number of traffic accidents in Berkeley on a

rainy day is a Poisson random variable with mean 9, whereas on a dry day it
is a Poisson random variable with mean 3. Let X denote the number of traffic
accidents tomorrow. Suppose it will rain tomorrow with probability 0.6. Calculate
P (X = 0).
Let
(
1, if it rains tomorrow
Y =
0, otherwise
Then,
P (X = 0) = P (X = 0|Y = 1)P (Y = 1) + P (X = 0|Y = 0)P (Y = 0)
0
9 (9)
= (0.6)(e
0!
0
3 (3)
) + (1 0.6)(e
0!
) = (0.6)(e9 ) + (0.4)(e3 ).
Markov Chains
Stochastic Processes: A discrete-time stochastic process {Xn , n = 0, 1, } is a collection of random variables.
For each n = 0, 1, , Xn is a random variable.
The index n is often interpreted as time and, as a result, we refer to Xn as the
state of the process at time n.
For example,
Xn might be the total number of customers that have entered a supermarket
by time n.
Xn might be the number of customers in the supermarket at time n.
A stochastic process is a family of random variables that describes the evolution
through time of some process.
Example (Frog Example). Suppose 1000 lily pads are arranged in a circle. A frog
starts at pad number 1000. Each minute, she jumps either straight up, or one
pad clockwise, or one pad counter-clockwise, each with probability 1/3.
25
1
P (at pad # 1 after 1 step) = .
3
1
3
1
3
P (at pad # 428 after 987 steps)?
Markov Chain: a discrete-time Markov chain is a discrete-time stochastic process specified by
A state space S: any non-empty finite or countable set.
In frog example, 1000 lily pads.
Transition probabilities {Pij }i,jS : probability of jumping to j if you start at i.
probability that the process will next
is in state i.
P
Pij 0, and j Pij = 1 for all i.
In frog example,
1/3,
1/3,
1/3,
Pij =
1/3,
1/3,
0,
make a transition into state j when it
ij =0
ij =1
ji=1
i j = 999
j i = 999
otherwise
Let Xn be the Markov chains state at time n. Then,

P (Xn+1 = j|X0 = i0 , X1 = i1 , , Xn1 = in1 , Xn = i) = P (Xn+1 = j|Xn = i) = Pij .
This is called Markov property.
Example (Gamblers ruin). Consider a gambling game in which on any turn you
win $1 with probability 0.4 or lose $1 with probability 0.6. Suppose further that
you adopt the rule that you quit playing if your fortune reaches $5.
Let Xn be the amount of money you have after n plays.
State space S = {0, 1, 2, 3, 4, 5}.
Xn has the Markov property since given the current state Xn any other information about the past is irrelevant for predicting the next state, Xn+1 .
Transition probabilities,
P (Xn+1 = i+1|X0 = i0 , , Xn = i) = P (Xn+1 = i+1|Xn = i) = 0.4,
0 < i < 5.
P (Xn+1 = i1|X0 = i0 , , Xn = i) = P (Xn+1 = i1|Xn = i) = 0.6,
0 < i < 5.
P (Xn+1 = 5|X0 = i0 , , Xn = 5) = P (Xn+1 = 5|Xn = 5) = 1.

P (Xn+1 = 0|X0 = i0 , , Xn = 0) = P (Xn+1 = 0|Xn = 0) = 1.
26
Transition matrix
0
0
0.6 0 0.4 0
0 0.6 0 0.4 0
0
P =
0 0.6 0 0.4 0
0
0
0
0 0.6 0 0.4
0
0
0
0
0
1
Example (Inventory Chain). Consider an (s, S) inventory control policy. That is,
when the stock on hand at the end of the day falls to s or below, we order enough
to bring it back up to S. For simplicity, we assume it happens at the beginning
of the next day.
27
Lecture 12:
Markov Chains ...
Example (Inventory Chain). Consider an (s, S) inventory control policy. That is, when
the stock on hand at the end of the day falls to s or below, we order enough to bring it
back up to S. For simplicity, we assume it happens at the beginning of the next day.
Suppose that s = 1 and S = 5. Also, assume that the distribution of the demand on
day n + 1 is
P (Dn+1 = 0) = 0.3, P (Dn+1 = 1) = 0.4, P (Dn+1 = 2) = 0.2, P (Dn+1 = 3) = 0.1.
Xn : the amount of stock on hand at the end of day n.
State space S = {0, 1, 2, 3, 4, 5}.
Transition probabilities,
P (Xn+1 = 0|Xn = 0): when stock on hand is zero at the end of day n, 5 units
will be ordered and therefore there will be 5 units available at the beginning
of day n + 1. Since the maximum demand on day n + 1 is 3, there will be
at least 1 unit available at the end of the day n + 1. This means that given
Xn = 0, Xn+1 is greater than zero, or
P (Xn+1 = 0|Xn = 0) = P (Dn+1 5) = 0.
P (Xn+1 = 1|Xn = 0) = P (Dn+1 = 4) = 0
(similar to the above discussion).
P (Xn+1 = 2|Xn = 0): when stock on hand is zero at the end of day n, 5 units
will be ordered and therefore there will be 5 units available at the beginning
of day n + 1. If the demand on day n + 1 is exactly 3, there will be 2 units
available at the end of the day n + 1.
P (Xn+1 = 2|Xn = 0) = P (Dn+1 = 3) = 0.1
Similarly,
P (Xn+1 = 3|Xn = 0) = P (Dn+1 = 2) = 0.2.
P (Xn+1 = 4|Xn = 0) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 5|Xn = 0) = P (Dn+1 = 0) = 0.3.
Similarly, for Xn = 1:
P (Xn+1 = 0|Xn = 1) = P (Dn+1 5) = 0.
P (Xn+1 = 1|Xn = 1) = P (Dn+1 = 4) = 0.
P (Xn+1 = 2|Xn = 1) = P (Dn+1 = 3) = 0.1
P (Xn+1 = 3|Xn = 1) = P (Dn+1 = 2) = 0.2.
P (Xn+1 = 4|Xn = 1) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 5|Xn = 1) = P (Dn+1 = 0) = 0.3.
28
P (Xn+1 = 0|Xn = 2): when stock on hand is 2 at the end of day n, 0 units
will be ordered and there will be 2 units available at the beginning of day
n + 1. Therefore,
P (Xn+1 = 0|Xn = 2) = P (Dn+1 2) = P (Dn+1 = 2) + P (Dn+1 = 3) = 0.3.
P (Xn+1 = 1|Xn = 2) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 2|Xn = 2) = P (Dn+1 = 0) = 0.3.
P (Xn+1 = 3|Xn = 2) = 0.
P (Xn+1 = 4|Xn = 2) = 0.
P (Xn+1 = 5|Xn = 2) = 0.
P (Xn+1 = 0|Xn = 3) = P (Dn+1 3) = P (Dn+1 = 3) = 0.1.
P (Xn+1 = 1|Xn = 3) = P (Dn+1 = 2) = 0.2.
P (Xn+1 = 2|Xn = 3) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 3|Xn = 3) = P (Dn+1 = 0) = 0.3.
P (Xn+1 = 4|Xn = 3) = 0.
P (Xn+1 = 5|Xn = 3) = 0.
P (Xn+1 = 0|Xn = 4) = P (Dn+1 4) = 0.
P (Xn+1 = 1|Xn = 4) = P (Dn+1 = 3) = 0.1.
P (Xn+1 = 2|Xn = 4) = P (Dn+1 = 2) = 0.2.
P (Xn+1 = 3|Xn = 4) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 4|Xn = 4) = P (Dn+1 = 0) = 0.3.
P (Xn+1 = 5|Xn = 4) = 0.
P (Xn+1 = 0|Xn = 5) = P (Dn+1 5) = 0.
P (Xn+1 = 1|Xn = 5) = P (Dn+1 4) = 0.
P (Xn+1 = 2|Xn = 5) = P (Dn+1 = 3) = 0.1.
P (Xn+1 = 3|Xn = 5) = P (Dn+1 = 2) = 0.2.
P (Xn+1 = 4|Xn = 5) = P (Dn+1 = 1) = 0.4.
P (Xn+1 = 5|Xn = 5) = P (Dn+1 = 0) = 0.3.
29
Transition matrix
0.1 0.2 0.4 0.3
0 0.1 0.2 0.4 0.3

0
0.3 0.4 0.3 0

0
0
P =
0
0.1 0.2 0.4 0.3 0
0 0.1 0.2 0.4 0.3 0
0
0 0.1 0.2 0.4 0.3
Example (Repair Chain). A machine has three critical parts that are subject to failure,
but can function as long as two of these parts are working. When two are broken, they
are replaced and the machine is back to working order the next day. Assume that
parts 1, 2, and 3 fail with probabilities 0.01, 0.02, and 0.04, but no two parts fail on
the same day. Formulate the system as a Markov chain.
Xn : the parts that are broken.
State space S = {0, 1, 2, 3, 12, 13, 23}.
Transition probabilities:
P (Xn+1 = 0|Xn = 0) = 1 0.01 0.02 0.04 = 0.93.
P (Xn+1 = 1|Xn = 0) = 0.01.
P (Xn+1 = 2|Xn = 0) = 0.02.
P (Xn+1 = 3|Xn = 0) = 0.04.
If we continue, we get the transition Matrix as,
0
1
2
3
12
13
23
0
0.93 0.01 0.02 0.04
0
0
0
1
0.94
0
0
0.02 0.04
0
0
2
0
0.95
0
0.01
0
0.04
0
0
0
0
0.97
0
0.01
0.02
P = 3
12
1
0
0
0
0
0
0
13
1
0
0
0
0
0
0
23
1
0
0
0
0
0
0
Multistep Transition Probabilities:

P (Xn+1 = j|Xn = i) = Pij gives the probability of going from i to j in one step.
What is the probability of going from i to j in m > 1 steps?
P (Xn+m = j|Xn = i) = Pijm =?
30
Chapman-Kolmogorov Equations:
Pijn+m
m
Pikn Pkj
,
n, m 0, all i, j.
k=0
To go from i to j in n + m steps, we have to go from i to some state k in n steps

and then from k to j in m steps.
Theorem. The m step transition probability P (Xn+m = j|Xn = i) is the mth power
of the transition matrix P ,
P (Xn+m = j|Xn = i) = Pijm = (P m )ij .
31
Lecture 13:
Multistep Transition Probabilities ...
Theorem. The m step transition probability P (Xn+m = j|Xn = i) is the mth power
of the transition matrix P ,
P (Xn+m = j|Xn = i) = Pijm = (P m )ij .
Example. Suppose that if it rains today, then it will rain tomorrow with probability 0.7; and if it does not rain today, then it will rain tomorrow with probability
0.4. Calculate the probability that it will rain two days from today given that it
is raining today. Also, calculate the probability that it will rain four days from
today given that it is raining today.
We model the problem as a Markov chain.
State space S = {0, 1} where 0 denotes that it rains and 1 denotes that it
does not rain.
Transition matrix
"
#
0.7 0.3
P =
0.4 0.6
Then,
"
P2 =
#
0.61 0.39
0.52 0.48
2
= 0.61.
The desired probability is P00
To calculate the probability that it will rain four days from today given that
it is raining today, we consider
#
"
0.5749
0.4251
P4 =
0.5668 0.4332
4
= 0.5749.
To obtain the mth power of a matrix, you can use WWW.WOLFRAMALPHA.COM.
For example, copy {{0.7, 0.3}, {0.4, 0.6}}4 in this website to get P 4 .
8
What about P00
?
"
#
0.5714
0.4286
P8 =
0.5714 0.4286
8
= 0.5714.
10
What about P00 ?
"
#
0.5714 0.4286
10
P =
0.5714 0.4286
Rows are identical! It says that the probability that it will rain in 10
days, or 2o days, ... is 0.5714.
32
10
= 0.5714.
Example. Consider an (s, S) inventory control policy with s = 1 and S = 5.

Assume that the distribution of the demand on day n + 1 is
P (Dn+1 = 0) = 0.3, P (Dn+1 = 1) = 0.4, P (Dn+1 = 2) = 0.2, P (Dn+1 = 3) = 0.1.
Suppose that today is day 0. What is the probability of having 3 units on hand
at the end of day 20 given that there are 2 units available at the end of today?
From last session we have,
0.1 0.2 0.4 0.3
0 0.1 0.2 0.4 0.3

0
0.3 0.4 0.3 0

0
0
P =
0
0.1 0.2 0.4 0.3 0
0 0.1 0.2 0.4 0.3 0
0
0 0.1 0.2 0.4 0.3
0
20
. Therefore,
We are looking for P23
0.0909 0.1556 0.231
0.0909 0.1556 0.231
0.0909 0.1556 0.231
P 20 =
0.0909 0.1556 0.231
0.0909 0.1556 0.231
0.0909 0.1556 0.231
0.2156 0.2012 0.1057
0.2156 0.2012 0.1057
0.2156 0.2012 0.1057
0.2156 0.2012 0.1057
0.2156 0.2012 0.1057
0.2156 0.2012 0.1057
Rows are identical! It says that the probability that there will be 3 units of
inventory on hand in 20 days, or 25 days, ... is 0.2156.
20
= 0.2156.
Classification of States:
State j is said to be accessible from state i if Pijn > 0 for some n 0.
Example. Consider a Markov chain with the following transition matrix.
P =
1
2
1
2

0.2 0.8
0 1.0
2 is accessible from 1, but 1 is not accessible from 2.

1
2
3
1 0.1 0.8 0.1

P = 2 0 0.9 0.1
3 0.4 0 0.6
33
2 and 3 are accessible from 1. 1 is accessible from 2 since with probability 0.1 we
can go from 2 to 3, and with probability 0.4 we can go from 3 to 1. Similarly, 2
is accessible from 3.
Two states i and j that are accessible to each other are said to communicate, and we
write i j.
P =
1
2
1
2

0.2 0.8
0 1.0
1 and 2 does not communicate, 1 = 2.

1
2
3
1 0.1 0.8 0.1

P = 2 0 0.9 0.1
3 0.4 0 0.6
1, 2, and 3 communicate with each other, 1 2, 1 3, 2 3.

Two states that communicate are said to be in the same class.
The Markov chain is said to be irreducible if there is only one class, that is, if all states
communicate with each other.
P =
1
2
1
2

0.2 0.8
0 1.0
The Markov chain has two classes, {1} and {2}. Therefore, it is not irreducible
or it is reducible.
1
2
3
1 0.1 0.8 0.1

P = 2 0 0.9 0.1
3 0.4 0 0.6
The Markov chain has one class, {1, 2, 3}. Therefore, the Markov chain is irreducible.
State i is said to be recurrent if starting in state i the process will ever reenter state i
with probability 1. Otherwise, state i is called transient.
34
State i is said to have period d if Piin = 0 whenever n is not divisible by d, and d is the
largest integer with this property.
For instance, starting in i, it may be possible for the process to enter state i only
at times 2, 4, 6, 8, in which case state i has period 2.
A state with period 1 is said to be aperiodic.
35
Lecture 14:
Multistep Transition Probabilities ...
State i is said to be recurrent if starting in state i the process will ever reenter state i
with probability 1. Otherwise, state i is called transient.
Suppose state i is recurrent. Then it is positive recurrent if, starting in i, the expected
time until the process returns to state i is finite.
Remark. Every irreducible Markov chain with a finite state space is positive recurrent.
State i is said to have period d if Piin = 0 whenever n is not divisible by d, and d is the
largest integer with this property.
For instance, starting in i, it may be possible for the process to enter state i only
at times 2, 4, 6, 8, in which case state i has period 2.
A state with period 1 is said to be aperiodic.
Remark. An irreducible Markov chain is aperiodic if there is a state i for which
Pii > 0.
Example. Consider a MC with the following transition matrix.
0
1
2
3
4
5
0 0 0.25 0 0.25 0.25 0.25

1
0
0
0.5
0
0.5
0
2
0.3
0
0.4
0
0.3
0
P =
3 0
0.3
0
0.4
0
0.3
4 0
0
0.5
0
0.5
0
5 0.5
0
0
0
0
0.5
Is this MC irreducible?
All states communicate with each other. Therefore, the MC is irreducible.
Long-run Behavior (Limiting Behavior):
Theorem. If a Markov chain is irreducible, positive recurrent, and aperiodic, then
the long-run proportion of time that the process will be in state j, j is
j = lim Pijn , j 0.
n
Remark. Positive recurrent, aperiodic states are called ergodic.

Remark. j is called stationary probabilities.
36
Example. Suppose that if it rains today, then it will rain tomorrow with probability 0.7; and if it does not rain today, then it will rain tomorrow with probability
0.4. Calculate the probability that it will rain two days from today given that it
is raining today. In long-run what fraction of time it rains.
We model the problem as a Markov chain.
State space S = {0, 1} where 0 denotes that it rains and 1 denotes that it
does not rain.
Transition matrix
"
#
0.7 0.3
P =
0.4 0.6
Then,
"
P 20 =
#
0.5714 0.4286
0.5714 0.4286
Therefore, in long-run the probability that it rains in a given day is 0.5714.

Example. Three of every four trucks on the road are followed by a car, while only
one of every five cars is followed by a truck. What fraction of vehicles on the road
are trucks?
Let Xn denote the type of the nth vehicle. Then, S = {C, T } where C and T
denote car and truck, respectively.
"
#
0.25 0.75
P =
0.2 0.8
Then,
"
P 20 =
#
4/19 15/19
4/19 15/19
4/19 fraction of vehicles are trucks.

Suppose that a Markov chain has the following transition matrix
P =
Then, 1 =
1
2
1
2

1a
a
b
1b
a
b
and 2 =
.
a+b
a+b
Example. A rapid transit system has just started operating. In the first month
of operation, it was found that 25% of commuters are using the system while
75% are travelling by automobile. Suppose that each month 10% of transit users
go back to using their cars, while 30% of automobile users switch to the transit
system. What fraction of people will eventually use the transit system?
37
The probability matrix,

"
P =
#
0.9 0.1
0.3 0.7
0.1
0.4
= 0.75 and 2 =
= 0.25.
0.3 + 0.1
0.3 + 0.1
Example. Market research suggests that in a five year period 8% of people with
cable television will get rid of it, and 26% of those without it will sign up for it.
What is the long run fraction of people with cable TV?
Then, 1 =
P =
Then, Cable =
Cable
No
Cable No

0.92 0.08
0.26 0.74
26
0.26
=
= 0.7647.
0.26 + 0.08
34
38
Lecture 15:
Long-run Behavior (Limiting Behavior) ...
Example. Consider an (s, S) inventory control policy. Assume that the distribution of
the demand on day n is
P (Dn = 0) = 0.3, P (Dn = 1) = 0.4, P (Dn = 2) = 0.2, P (Dn = 3) = 0.1.
Suppose that sales produce a profit of $12 but it costs $2 a day to keep unsold units
in the store overnight. What are the optimal values of s and S that maximize the
long-run net profit?
The objective is to maximize the long-run net profit, i.e.,
E [net profit] = E [sales] E [holding costs] .
Let I denote the inventory level at the beginning of the day. Conditioning on the
inventory level at the beginning of the day, we have
X
E [net profit] =
E [net profit|I = k] P (I = k).
k
Note that P (I) is the long-run probability of having I units at the beginning of
the day.
Since it is impossible to sell 4 units in a day, and it costs us to have unsold
inventory we should never have more than 3 units on hand.
Based on the above discussion the inventory level at the beginning of a day is
either 3, 2, or 1. We consider them separately.
Suppose that the inventory level at the beginning of a day is 3, i.e., I = 3.
Then the sales of the day is
E [sales|I = 3] = E [sales|I = 3, Dn = 0] P (Dn = 0)+E [sales|I = 3, Dn = 1] P (Dn = 1)
+E [sales|I = 3, Dn = 2] P (Dn = 2) + E [sales|I = 3, Dn = 3] P (Dn = 3)
= [0 (12)] P (Dn = 0)+[1 (12)] P (Dn = 1)+[2 (12)] P (Dn = 2)+[3 (12)] P (Dn = 3)
= [0 (12)] (0.3) + [1 (12)] (0.4) + [2 (12)] (0.2) + [3 (12)] (0.1) = 13.2.
The holding costs of the day is
E [costs|I = 3] = E [costs|I = 3, Dn = 0] P (Dn = 0)+E [costs|I = 3, Dn = 1] P (Dn = 1)
+E [costs|I = 3, Dn = 2] P (Dn = 2) + E [costs|I = 3, Dn = 3] P (Dn = 3)
= [3 (2)] P (Dn = 0)+[2 (2)] P (Dn = 1)+[1 (2)] P (Dn = 2)+[0 (2)] P (Dn = 3)
= [3 (2)] (0.3) + [2 (2)] (0.4) + [1 (2)] (0.2) + [0 (2)] (0.1) = 3.8.
39
The net profit of the day is

E [net profit|I = 3] = E [sales|I = 3] E [costs|I = 3] = 13.2 3.8 = 9.4.
E [sales|I = 2] = E [sales|I = 2, Dn = 0] P (Dn = 0)+E [sales|I = 2, Dn = 1] P (Dn = 1)
+E [sales|I = 2, Dn = 2] P (Dn = 2) + E [sales|I = 2, Dn = 3] P (Dn = 3)
= [0 (12)] P (Dn = 0)+[1 (12)] P (Dn = 1)+[2 (12)] P (Dn = 2)+[2 (12)] P (Dn = 3)
= [0 (12)] (0.3) + [1 (12)] (0.4) + [2 (12)] (0.2) + [2 (12)] (0.1) = 12.
= [2 (2)] P (Dn = 0)+[1 (2)] P (Dn = 1)+[0 (2)] P (Dn = 2)+[0 (2)] P (Dn = 3)
= [2 (2)] (0.3) + [1 (2)] (0.4) + [0 (2)] (0.2) + [0 (2)] (0.1) = 2.0.
E [net profit|I = 2] = E [sale|I = 2] E [costs|I = 2] = 12 2 = 10.
E [sale|I = 1] = E [sale|I = 1, Dn = 0] P (Dn = 0)+E [sale|I = 1, Dn = 1] P (Dn = 1)
+E [sale|I = 1, Dn = 2] P (Dn = 2) + E [sale|I = 1, Dn = 3] P (Dn = 3)
= [0 (12)] P (Dn = 0)+[1 (12)] P (Dn = 1)+[1 (12)] P (Dn = 2)+[1 (12)] P (Dn = 3)
= [0 (12)] (0.3) + [1 (12)] (0.4) + [1 (12)] (0.2) + [1 (12)] (0.1) = 8.4.
= [1 (2)] P (Dn = 0)+[1 (2)] P (Dn = 1)+[0 (2)] P (Dn = 2)+[0 (2)] P (Dn = 3)
= [1 (2)] (0.3) + [0 (2)] (0.4) + [0 (2)] (0.2) + [0 (2)] (0.1) = 0.6.
E [net profit|I = 1] = E [sale|I = 1] E [costs|I = 1] = 8.4 0.6 = 7.8.
40
P
To obtain E [net profit] = 3k=0 E [net profit|I = k] P (I = k), we need to calculate P (I = k) which depends on the inventory control policy.
Since it is impossible to sell 4 units in a day, and it costs us to have unsold
inventory we should never have more than 3 units on hand, we compare the profit
of (2, 3), (1, 3), (0, 3), (1, 2), and (0, 1) inventory policies.
Consider (2, 3) inventory policy. In this case we always start a day with 3 units,
therefore,
0.1 0.2 0.4 0.3
0.1 0.2 0.4 0.3
P =
0.1
0.2
0.4
0.3
0.1 0.2 0.4 0.3

and
P 20
0.1
0.1
=
0.1
0.1
0.2 0.4 0.3
0.2 0.4 0.3
0.2 0.4 0.3
0.2 0.4 0.3
Therefore, under the (2, 3) inventory control policy, the long-run probabilities
of having 0, 1, 2, and 3 units at the end of the day are 0 = 0.1, 1 = 0.2,
2 = 0.4, and 3 = 0.3, respectively.
Also, under the (2, 3) inventory control policy, the inventory at the beginning
of a day is always 3. Therefore,
E [net profit] =
3
X
E [net profit|I = k] P (I = k) =
k=1
E [net profit|I = 1] P (I = 1)+E [net profit|I = 2] P (I = 2)+E [net profit|I = 3] P (I = 3)

= (7.8) (0) + (12) (0) + (9.4) (1) = 9.4.
Consider (1, 3) inventory policy. Then,
0.1 0.2
0.1 0.2
P =
0.3 0.4
0.1 0.2
and
P 20
19/110
19/110
=
19/110
19/110
0.4 0.3
0.3 0.4
0.3 0
0.4 0.3
30/110 40/110 21/110
30/110 40/110 21/110
30/110 40/110 21/110
30/110 40/110 21/110
41
Therefore, under the (1, 3) inventory control policy, the long-run probabilities of
having 0, 1, 2, and 3 units at the end of the day are 0 = 19/110, 1 = 30/110,
2 = 40/110, and 3 = 21/110, respectively.
Under the (1, 3) inventory control policy, the inventory at the beginning of a day
is either 2 or 3. The long-run probability that the inventory level at the beginning
of a day is 2 is P (I = 2) = 2 = 40/110, and P (I = 3) = 0 + 1 + 3 = 70/110.
Therefore,
3
X
E [net profit] =
k=1

70
40
+ (9.4)
= 9.61818.
= (7.8) (0) + (10)
110
110
0.1 0.2 0.4 0.3
0
0.7 0.3 0
P =
0.3 0.4 0.3 0
0.1 0.2 0.4 0.3

and
P 20
343/1070
343/1070
=
343/1070
343/1070
300/1070 280/1070 147/1070
300/1070 280/1070 147/1070
300/1070 280/1070 147/1070
300/1070 280/1070 147/1070
having 0, 1, 2, and 3 units at the end of the day are 0 = 343/1070, 1 = 300/1070,
2 = 280/1070, and 3 = 147/1070, respectively.
Under the (0, 3) inventory control policy, the inventory at the beginning of a day
is either 1, 2 or 3. Therefore, P (I = 1) = 1 = 300/1070, P (I = 2) = 2 =
280/1070, and P (I = 3) = 0 + 3 = 490/1070. Therefore,
E [net profit] =
3
X
k=1

300
280
490
= (7.8)
+ (10)
+ (9.4)
= 9.108.
1070
1070
1070
42
0.3 0.4 0.3
P = 0.3 0.4 0.3

0.3 0.4 0.3
having 0, 1, and 2 units at the end of the day are 0 = 0.3, 1 = 0.4, and 2 = 0.3,
respectively. Then,
E [net profit] =
2
X
k=1
= (7.8) (0) + (10) (1) = 10.

0.3 0.4 0.3
P = 0.7 0.3 0
0.3 0.4 0.3
having 0, 1, and 2 units at the end of the day are 0 = 49/110, 1 = 40/110, and
2 = 21/110, respectively. Then,
E [net profit] =
2
X
k=1

= (7.8)
40
110

+ (10)
70
110

= 9.2.
Therefore, the optimal values for s and S are s = 1 and S = 2.

Remark. Let P be the transition matrix of a Markov chain. Then, the stationary
distribution of the Markov chain is the solutions of P = where is the vector the
stationary distribution.
43

Lecture Notes-March06

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Notes-March06

Uploaded by

Copyright:

Available Formats

MSCI 431 (Stochastic Models and Methods)

Summary of Lectures, Winter 2014

Example. Flipping a fair coin: P ({H}) = P ({T }) = 1/2.

This implies P (E|F ) = P (E) and P (F |E) = P (F ).

This implies P (E|F ) = P (E) and P (F |E) = P (F ).

P (W ) = P (H)P (W |H) + P (H c )P (W |H c ) = (1/2)(2/9) + (1/2)(5/11).

Y : a random variable taking on one of the values 0, 1, 2.

Example. Suppose that we flip a coin having a probability p of coming up heads

N : a random variable taking on one of the values 1, 2, 3, ...

P (N = n) = P ({T, T, , T, H}) = (1 p)n1 p, n 1.

Cumulative Distribution Function (CDF) or distribution function F () of the random

Y : a random variable taking on one of the values 0, 1, 2.

Discrete Random Variables:

Consider discharge of a patient as a success.

X is a binomial random variable with parameters (3, 0.1).

Poisson Random Variable

Each building has a very small probability p of having a fire.

X is just as likely to be near any value in (0, 1) as any other value:

A continuous random variable whose probability density function is given, for

The expected value of X is a weighted average of the possible values that X

Example. Calculate E[X] when X is a Poisson random variable with parameter

Example. Calculate E[X] when X is a random variable uniformly distributed

Integrating by parts (dv =

Remark. If a and b are constants, then

The probability mass function of Y can be obtained from P (x, y) by

Remark. If X1 , X2 , , Xn are n independent random variables, then for any n constants a1 , a2 , , an ,

The conditional cumulative distribution function of X given Y = y is defined, for all

The conditional expectation of X given that Y = y is defined by

Calculate E[X|Y = 2].

E[X|Y = 2] = (1)P (X = 1|Y = 2) + (2)P (X = 2|Y = 2) + (3)P (X = 3|Y = 2)

Remark. If X is independent of Y , then P (X = x|Y = y) = P (X = x).

P (X1 = k)P (X2 = 8 k)

The last equality is because of E [Xi |Y = 1] = pi .

Conditional Probability: Continuous Case

Example. Suppose the joint probability density of X and Y is given by

6xy(2 x y)dx = y(4 3y).

For continuous random variables X and Y , and real-valued function g(X, Y )

Example. Suppose the joint probability density of X and Y is given by

The conditional expectation of X, given that Y = y, is defined for all values of y by

Example. Suppose the joint probability density of X and Y is given by

Computing Expectations by Conditioning

If Y is a continuous random variable, then the expected value of X can be obtained

Computing Probabilities by Conditioning

If Y is a continuous random variable, then the probability of event E can be obtained

Example. Data indicate that the number of traffic accidents in Berkeley on a

make a transition into state j when it

Let Xn be the Markov chains state at time n. Then,

P (Xn+1 = i1|X0 = i0 , , Xn = i) = P (Xn+1 = i1|Xn = i) = 0.6,

P (Xn+1 = 5|X0 = i0 , , Xn = 5) = P (Xn+1 = 5|Xn = 5) = 1.

0.1 0.2 0.4 0.3

0 0.1 0.2 0.4 0.3

0.3 0.4 0.3 0

0 0.1 0.2 0.4 0.3 0

Multistep Transition Probabilities:

To go from i to j in n + m steps, we have to go from i to some state k in n steps

Example. Consider an (s, S) inventory control policy with s = 1 and S = 5.

0.1 0.2 0.4 0.3

0 0.1 0.2 0.4 0.3

0.3 0.4 0.3 0

0 0.1 0.2 0.4 0.3 0

0.0909 0.1556 0.231

0.0909 0.1556 0.231

0.0909 0.1556 0.231