You are on page 1of 4

IEMS 315: Stochastic Models and Simulation Page 1 of 4

Prof. Ohad Perry

Lecture 2: Basics of Probability 2

• In the previous lecture we defined probability using the three axioms.


1. (Positivity) P(E) ≥ 0.
2. (Normalization) P(Ω) = 1.
3. (Additivity) If E1 , E2 , . . . is a sequence of disjoint events, then

! ∞
[ X
P Ei = P(Ei )
i=1 i=1

• Formally, the events E in Axiom 1 and E1 , E2 , . . . in Axiom 3 are contained in a set


F of “measurable subsets” of Ω. That is, not every subset of Ω can be assigned a
probability in general.
• A Probability Space (Ω, F, P) specifies the sample space Ω and the set of all the
events in F that are “measurable” with respect to the probability P. (That is, the
sets to which we can assign a probability.) However, in this course we are simplifying
the theory by assuming that the probabilities of all the events we are interested in
are well-defined, so that we can ignore F. We thus consider only (Ω, P) to be the
probability space.
• We then discussed conditional probability of A given B, defined by
P(A, B)
P(A|B) = , when P(B) > 0;
P(B)
the law of total probability:

P(B) = P(B|A1 )P(A1 ) + · · · + P(B|An )P(An );

and Bayes’ rule:


P(Ai , B) P(B|Ai )P(Ai ) P(B|Ai )P(Ai )
P(Ai |B) = = = ,
P(B) P(B) P(B|A1 )P(A1 ) + · · · + P(B|An )P(An )
which relates conditional probabilities of the form P(A|B) with conditional probabili-
ties of the form P(B|A).
The events A1 , . . . , An in the law of total probability and Bayes’ rule form a partition
of the state space Ω. In particular, P(Aj ) > 0 for all j = 1, . . . , n.

Example 1: A test for a certain disease is assumed to be correct 95% of the time: if a
person has the disease, the test results are positive with probability 0.95, and if the person
does not have the disease, the test results are negative with probability 0.95. A random
person has probability of 0.001 of having the disease. Given that a person just tested
positive, what is the probability of him having the disease?
2 IEMS 315, Lecture 2

Solution: Let
A be the event that the tested person has the disease;
B the event that the test results are positive.
We are asked to compute P(A|B). However, we are given P(B|A) and P(B|Ac ), in addition
to P(A). We therefore use Baye’s rule:

P(A, B) P(B|A)P(A) P(B|A)P(A)


P(A|B) = = =
P(B) P(B) P(B|A)P(A) + P(B|Ac )P(Ac )
0.95 · 0.001
= = 0.0187.
0.95 · 0.001 + 0.05 · 0.999
This example shows that conditional probability can be unintuitive: Even though the test
seems to be accurate, a person who has tested positive is very unlikely (less than 2%!) to
have the disease.
To better understand this result, and to help develop intuition, let us think of the problem
in terms of frequencies and consider a simple approximation: As in the problem statement,
assume that 1 in every 1000 people has the disease (that is, there is a 0.001 chance of having
the disease), and that a test is 95% correct in detecting the condition (infected or not) of
any person. If we test 1000 people, then we know that, on average, 1 of them will have the
disease, and it is highly likely that the test will find the condition of that person, i.e, it will
be positive. However, out of the (approximately) non-infected 999 people that are tested,
approximately 50 (more precisely, 0.05 × 999 = 49.95) will be tested positive on average.
Therefore, on average, approximately 51 people will test positive, while only one of those is
likely to really be infected. In other words, the probability of the test being correct, given a
positive result, is approximately 1/51 = 0.0196, which is very close to the true answer.
To obtain the correct answer, observe that only 0.95 out of 1000 is both infected and detected,
instead of 1 which we took for the rough analysis, and that 49.95 of the non-infected are tested
positive by mistake. Then the total number of positives is, on average, 49.95 + 0.95 = 50.9,
and only 0.95 of those is a correct positive, on average. That is, we have 0.95/50.9, exactly
as in the solution, and not 1/51 as in the rough approximation.

Example 2: Suppose you are on a game show, and you are given the choice of three doors.
Behind one of the doors is a car; behind each of the other two doors there is a goat. You
pick a door, say door number 1, and the host, who knows what is behind all the doors, opens
another door, say door number 3, which has a goat behind it. He then asks you if you want
to change your previous choice or not. What should you do? Does it matter which door you
choose?

Solution: First note that the problem is not well defined – non-mathematical issues are
important here. In particular, does the host ask you to change only if he knows that you
chose the right door, or only if you chose the wrong one? Or does he always ask you whether
you want to change you choice? (In fact, one may wonder whether the car is desirable, or a
goat is. We take it for granted that you desire the car.)
It is natural to assume that the host always opens a door with a goat, regardless of your
choice. Given this assumption, it can be shown that it is better to change your choice: If
IEMS 315, Lecture 2 3

you stay with your first choice, then you win with probability 1/3. That was the probability
of choosing the right door before the host opened a door for you, and that has not changed.
However, if you do change your choice, then the probability that your second choice has the
car is 2/3. One way to see why this must be true is that, before the host opened a door
with a goat, there is a probability of 2/3 that the car is behind one of the doors 2 or 3.
However, once the host opens door number 3 and shows you a goat, you know that the 2/3
probability of having the car behind one of the doors 2, 3 has all its mass on door number 2.
Another way to see this is to consider two strategies - in the first you never change, and
in the second you always change (assuming you can run this trial many times). Clearly,
with the first strategy you will win approximately 1/3 of the times. That means that if you
always change, you must win approximately 2/3 of the times, if the number of trials is large
enough.
To read more about this problem, google the “Monty Hall Problem”. There is a nice article
in Wikipedia, including a simple proof. According to Wikipedia, “Even when given expla-
nations, simulations, and formal mathematical proofs, many people still do not accept that
switching is the best strategy.” But hopefully, you do.
(You can play the game online: http://www.math.ucsd.edu/~crypto/Monty/monty.html.
You can try each of the two strategies (switching your initially chosen door, and not switch-
ing) several times and check for yourself whether switching is the better strategy.)

Independent Events
Two events A and B are said to be independent if

P(A, B) = P(A)P(B). (1)

It follows from the definition of conditional probability that A and B are independent if
P(A|B) = P(A). That is, the probability that A occurs does not depend on whether B
occurs. So why are we using (1) as a definition? This is because P(A|B) is defined for events
B with P(B) > 0, but we want the concept of independence of events to be be well defined
even when P(B) = 0.
However, it is helpful to think of independence in terms of the conditional probability because
it gives a much more intuitive explanation to the meaning of independence: A and B are
independent if the probability of A occurring, given that B has occurred, is the same as the
probability of A occurring, and vice versa. That is, knowing that B occurred gives no useful
information about the probability of A.
Do not confuse independence of events with the events being disjoint. In fact, two disjoint
events A, B with P(A) > 0 and P(B) > 0 are never independent, since

0 = P(A, B) 6= P(A)P(B) > 0.

The simplest example is when A is an event with 0 < P(A) < 1. Then A and Ac are
disjoint, but are clearly not independent; if we know that A occurred, then we know with
certainty (with prob. 1) that Ac did not occur, and vice versa – the two events are completely
dependent!
4 IEMS 315, Lecture 2

Claim: If A and B are independent, then A and B c are also independent.


Proof.
P(A) = P(A, B) + P(A, B c ) = P(A)P(B) + P(A, B c ),
where the first equality follows from the additivity property of probability, and the second
equality follows from the assumed independence of A and B. It follows that

P(A, B c ) = P(A)(1 − P(B)) = P(A)P(B c )

which, by definition, implies the result.


By symmetry, Ac and B are also independent, and it can be shown (easily) that Ac and B c
are independent as well.
Independence of several events: We say that events A1 , A2 , . . . , An are independent if
!
\ Y
P Ai = P(Ai ) for every subset K ⊆ {1, 2, . . . , n}.
i∈K i∈K

Example 1.10, pg. 11 in 10th edition: Pairwise independence does not imply
independence. Let a ball be drawn from an urn containing four balls, numbered 1,2,3,4.
Let E = {1, 2}, F = {1, 3}, G = {1, 4}. If all four outcomes are equally likely, then

P(EF ) = P(E)P(F ) = 1/4


P(EG) = P(E)P(G) = 1/4
P(F G) = P(F )P(G) = 1/4

However,
1/4 = P(EF G) 6= P(E)P(F )P(G) = 1/23 = 1/8.

Independence of a sequence of events: We say that A1 , A2 , . . . is an independent


(infinite) sequence of events if
n
! n
\ Y
P Akj = P(Akj ) for all n ≥ 1 and i1 , . . . , in .
j=1 j=1

That is, the events {Ak : k ≥ 1} are said to be independent, if the events in any finite subset
of this sequence are independent.

You might also like