You are on page 1of 33

Conditional

Probability
Marginal and Joint Probabilities
If a probability is based on a single variable, it is a marginal
probability. The probability of outcomes for two or more variables
or processes is called a joint probability.

Relapse Data Set: Researchers randomly assigned 72 chronic users of


cocaine into three groups: desipramine (antidepressant), lithium (standard
treatment for cocaine) and placebo. Results of the study are summarized
below.

http://www.oswego.edu/~srp/stats/2_way_tbl_1.htm
Marginal probability
What is the probability that a patient relapsed?

P(relapsed) = 48 / 72 ~ 0.67

P(relapsed) is a marginal probability because it is based on a


single variable, patient relapse
Joint probability
What is the probability that a patient received the antidepressant
(desipramine) and relapsed?

P(relapsed and desipramine) = 10 / 72 ~ 0.14

P(relapsed and desipramine) is a joint probability because it is the


probability of an outcome based on two variables, treatment type &
patient relapse
Joint probability
We use table proportions to summarize joint probabilities for the
Relapse Data Set. These proportions are computed by dividing
each count in the data table.

relapse no relapse total


desipramine 10% = .14 14% = .19 24% = .33
72 72 72
lithium 18% = .25 6% = .08 24% = .33
72 72 72
placebo 20% = .28 4% = .06 24% = .33
72 72 72
total 48% = .67 24% = .33 1.0
72 72
Joint probability
By reorganizing the table proportions, the joint probability
distribution for the relapse data set can be seen.
Joint Outcomes Probability
Relapse and desipramine .14
Relapse and lithium .25
Relapse and placebo .28
No relapse and desipramine .19
No relapse and lithium .08
No relapse and placebo .06
Total 1.0

Let’s verify the table represents a probability distribution:


ü Each of the 6 outcome combinations are disjoint
ü All probabilities are non-negative
ü Sum of probabilities is 1
Conditional probability
What is the probability that a patient relapsed given that they
received the antidepressant (desipramine)?

This is called a conditional probability because we compute


the probability under a condition: a person received the
antidepressant. There are two parts to a conditional probability;
the outcome of interest and the condition.

𝑃 𝑟𝑒𝑝𝑙𝑎𝑝𝑠𝑒 𝑔𝑖𝑣𝑒𝑛 𝑑𝑒sipramine = P relapse desipramine)

Outcome of Condition
interest
“|” is read as
given
Conditional probability
What is the probability that a patient relapsed given that they
received the antidepressant (desipramine)?

P relapse desipramine) = 10% = 0.42


24

“given desipramine” means we limit


the fraction where
our view to only those 24 cases
patients relapsed
where the patient took desipramine
Conditional probability
Defn: The conditional probability of the outcome of interest A
given condition B is calculated as:

What is the probability that a patient relapsed given that they


received the antidepressant (desipramine)?

10%
= 72
24%
72
Conditional probability (cont.)
If we know that a patient relapsed, what is the probability that
they received the antidepressant (desipramine)?

This example asks “given relapse, what is probability patient took


desipramine?” It is the inverse of the previous example

Using the table:


𝑃 𝑑𝑒𝑠𝑖𝑝𝑟𝑎𝑚𝑖𝑛𝑒 𝑟𝑒𝑙𝑎𝑝𝑠𝑒)
= 10%48 = 0.21

Using the formula:

𝑃(𝑑𝑒𝑠𝑖𝑝𝑟𝑎𝑚𝑖𝑛𝑒 𝑎𝑛𝑑 𝑟𝑒𝑙𝑎𝑝𝑠𝑒) 10%72 10


𝑃 𝑑𝑒𝑠𝑖𝑝𝑟𝑎𝑚𝑖𝑛𝑒 𝑟𝑒𝑙𝑎𝑝𝑠𝑒) = = = = 0.21
𝑃(𝑟𝑒𝑙𝑎𝑝𝑠𝑒) 48% 48
72
Sum of Conditional Probabilities
Let 𝐴J , ⋯ , 𝐴M represent all the disjoint outcomes for a variable or
process. Then if 𝐵 is an event, possibly for another variable or
process, we have:
𝑃 𝐴J 𝐵 + ⋯ + 𝑃 𝐴M 𝐵 = 1

The rule for complements also holds when an event and its
complement are conditioned on the same information:
𝑃 𝐴 𝐵 = 1 − 𝑃(𝐴Q |𝐵)
Examples:
𝑃 𝑑𝑒𝑠𝑖𝑝𝑟𝑎𝑚𝑖𝑛𝑒 𝑟𝑒𝑙𝑎𝑝𝑠𝑒 + 𝑃 𝑙𝑖𝑡ℎ𝑖𝑢𝑚 𝑟𝑒𝑙𝑎𝑝𝑠𝑒 + 𝑃 𝑝𝑙𝑎𝑐𝑒𝑏𝑜 𝑟𝑒𝑙𝑎𝑝𝑠𝑒
= 10⁄48 + 18⁄48 + 20⁄48 = 1

𝑃 𝑟𝑒𝑙𝑎𝑝𝑠𝑒 𝑝𝑙𝑎𝑐𝑒𝑏𝑜 = 20%24 = 1 − 4%24 = 1 − 𝑃(𝑛𝑜 𝑟𝑒𝑙𝑎𝑝𝑠𝑒|𝑝𝑙𝑎𝑐𝑒𝑏𝑜)


Relapse and No Relapse are complements of each other
Practice
The family_college data set contains a sample of 792 cases with two
variables, teen and parents. The teen variable is either college or not,
where the college label means the teen went to college immediately
after high school. The parents variable takes the value degree if at
least one parent of the teenager completed a college degree. Using
the contingency table, compute the following probabilities:

1. 𝑃 𝑝𝑎𝑟𝑒𝑛𝑡𝑠 𝑑𝑒𝑔𝑟𝑒𝑒 = ?
2. 𝑃 𝑡𝑒𝑒𝑛 𝑐𝑜𝑙𝑙𝑒𝑔𝑒 𝑎𝑛𝑑 𝑝𝑎𝑟𝑒𝑛𝑡𝑠 𝑑𝑒𝑔𝑟𝑒𝑒 = ?
3. 𝑃 𝑡𝑒𝑒𝑛 𝑐𝑜𝑙𝑙𝑒𝑔𝑒 𝑝𝑎𝑟𝑒𝑛𝑡𝑠 𝑑𝑒𝑔𝑟𝑒𝑒) = ?
Practice

1. 𝑃 𝑝𝑎𝑟𝑒𝑛𝑡𝑠 𝑑𝑒𝑔𝑟𝑒𝑒 = [\]⁄^_[ Marginal probability


Joint
2. 𝑃 𝑡𝑒𝑒𝑛 𝑐𝑜𝑙𝑙𝑒𝑔𝑒 𝑎𝑛𝑑 𝑝𝑎𝑟𝑒𝑛𝑡𝑠 𝑑𝑒𝑔𝑟𝑒𝑒 = [`J⁄^_[ probability

3. 𝑃 𝑡𝑒𝑒𝑛 𝑐𝑜𝑙𝑙𝑒𝑔𝑒 𝑝𝑎𝑟𝑒𝑛𝑡𝑠 𝑑𝑒𝑔𝑟𝑒𝑒) =


a(bccd Qeffcgc hdi jhkcdbl icgkcc) mno⁄
pqm [`J Conditional
= mmr =
a(jhkcdbl icgkcc) ⁄pqm [\] probability

Label each probability as either marginal, joint or conditional


General multiplication rule
Recall: Multiplication Rule for Independent Processes

If A and B represent events from two different and independent


processes, then the probability that both A and B occur can be
calculated as the product of their separate probabilities:
𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∗ 𝑃(𝐵)

But what if the events are not independent?

General Multiplication Rule:

If A and B represent two outcomes or events, then:


𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴|𝐵 ∗ 𝑃(𝐵)
General multiplication rule
General Multiplication Rule:

If A and B represent two outcomes or events, then:


𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴|𝐵 ∗ 𝑃(𝐵)

● Note that this formula is simply the conditional probability


formula, rearranged.
● It is useful to think of A as the outcome of interest and B as
the condition.
● If A and B are independent, then P(A|B) = P(A) since given B
provides no information about A. Thus rule becomes
𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∗ 𝑃 𝐵 aka multiplication rule for
independent processes
Independence and
conditional probabilities
Consider the following (hypothetical) distribution of gender and major of
students in an introductory statistics class:

● The probability that a randomly selected student is a social


science major is 𝑃 𝑆𝑆 = 60⁄100 = 0.6
● The probability that a randomly selected student is a social
science major given that they are female is 𝑃 𝑆𝑆 𝐹) = 30⁄50 = 0.6
● Notice that both probabilities are 0.6 aka 𝑃 𝑆𝑆 𝐹) = 𝑃(𝑆𝑆).
The major of students in the class does not depend on their
gender.
Independence and
conditional probabilities
Generically, if P(A | B) = P(A) then the events A and B
are said to be independent.

Conceptually: Giving B doesn’t tell us anything about A.

Mathematically:
a(v hdi w)
The conditional probability formula states that 𝑃 𝐴 𝐵 =
a(w)

And we know, from multiplication rule for independent processes, that


if events A and B are independent, then 𝑃 𝐴 𝑎𝑛𝑑 𝐵 = 𝑃 𝐴 ∗ 𝑃 𝐵
Thus
𝑃(𝐴 𝑎𝑛𝑑 𝐵) 𝑃 𝐴 ∗ 𝑃(𝐵)
𝑃 𝐴 𝐵 = = = 𝑃(𝐴)
𝑃(𝐵) 𝑃(𝐵)
Tree Diagrams
Tree diagrams are a tool to organize outcomes and probabilities
around the structure of the data. They are most useful when two or
more processes occur in a sequence and each process is conditioned
on its predecessors.

Breast Cancer:
In Canada, about 0.35% of women (over 40) will develop breast cancer in
any given year. A common screening test for cancer is the mammogram,
but the test is not perfect. In about 11% of patients with breast cancer,
the test gives a false negative. Similarly, the test gives a false positive
in 7% of patients without breast cancer.
Tree Diagrams

𝑃(𝑐𝑎𝑛𝑐𝑒𝑟)

𝑃(𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟)

The first branch for Truth is said to the primary branch. It shows the
marginal probability that a patient does, in truth, have breast cancer.
From the Breast Cancer data, we know: 𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 = 0.35%
And we can calculate 𝑃 𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟 = 1 − 𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 = 1 − .0035 = .9965
Tree Diagrams
𝑃(𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒|𝑐𝑎𝑛𝑐𝑒𝑟)

𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒|𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟)

The remaining branches are considered secondary and are conditioned on


the primary branch. Here the secondary branch is for Mammogram and are
assigned conditional probabilities.
From the Breast Cancer data, we know
𝑃 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 = 11% and 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟) = 7%
Tree Diagrams
𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒|𝑐𝑎𝑛𝑐𝑒𝑟)

𝑃(𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒|𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟)

And we can calculate the remaining conditional probabilities:


𝑃 𝑝𝑜𝑠𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 = 1 − 𝑃 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 = 1 − .11 = .89
𝑃 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟) = 1 − 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟) = 1 − .07 = .93
Tree Diagrams 𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑎𝑛𝑑 𝑐𝑎𝑛𝑐𝑒𝑟)

We may (and usually do) construct joint probabilities at the end of each
branch in our tree by multiplying the number we come across as we
move from left to right. These joint probabilities are computed using the
General Multiplication Rule:
𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 & 𝑐𝑎𝑛𝑐𝑒𝑟 = 𝑃 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 ∗ 𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 = .89 .0035 . 00312
Tree Diagrams

By constructing the Tree Diagram, we were able to efficiently compute


and organize the marginal, conditional and joint probabilities regarding
mammogram results.
It is going to be especially useful when calculating inverted probabilities
Inverting probabilities
In many instances, we have the conditional probability of the form
𝑃 𝐴𝐵

But what we really want to know is the inverted conditional probability


𝑃 𝐵𝐴

When a patient goes through breast cancer screening there are two competing
claims: patient had cancer and patient doesn't have cancer. If a mammogram
yields a positive result, what is the probability that patient actually has cancer?

From our tree diagram, we know


𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 = .89

But what we want to know is


𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 = ?
Inverting probabilities
When a patient goes through breast cancer screening there are two
competing claims: patient had cancer and patient doesn't have cancer.
If a mammogram yields a positive result, what is the probability that
patient actually has cancer?
𝑃(𝑐𝑎𝑛𝑐𝑒𝑟 & 𝑝𝑜𝑠𝑖𝑡𝑣𝑒)
𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 =
𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒)

𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 & 𝑝𝑜𝑠𝑖𝑡𝑣𝑒


= 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 ∗ 𝑃(𝑐𝑎𝑛𝑐𝑒𝑟)
Inverting probabilities
There are only two scenarios where a patients gets a positive test
result; Patient gets positive test results and has cancer, or Patient gets
a positive test result and does not have cancer. Thus the probability of
positive test result is the sum of those two scenarios:

𝑃 positive = P positive and cancer + P(positive and no cancer)


= 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 ∗ 𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 + 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟 ∗ 𝑃(𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟)

= 0.00312 + 0.06976 = 0.07288

𝑃(𝑐𝑎𝑛𝑐𝑒𝑟 & 𝑝𝑜𝑠𝑖𝑡𝑣𝑒)

𝑃(𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟 & 𝑝𝑜𝑠𝑖𝑡𝑣𝑒)


Inverting probabilities
When a patient goes through breast cancer screening there are two
competing claims: patient had cancer and patient doesn't have cancer.
If a mammogram yields a positive result, what is the probability that
patient actually has cancer?

𝑃(𝑐𝑎𝑛𝑐𝑒𝑟 & 𝑝𝑜𝑠𝑖𝑡𝑣𝑒) 0.00312


𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 = = ≈ 0.0428
𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒) 0.07288
Bayes' Theorem
The conditional probability formula we have seen so far is a special
case of the Bayes' Theorem, which is applicable even when events
have more than just two outcomes.
Bayes’ Theorem
𝑃 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝐴 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 1 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝐵 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 2
𝑃 𝐵 𝐴J 𝑃(𝐴J)
=
𝑃 𝐵 𝐴J 𝑃 𝐴J + 𝑃 𝐵 𝐴[ 𝑃 𝐴[ + ⋯ + 𝑃 𝐵 𝐴M 𝑃(𝐴M )

where 𝐴[ , … , 𝐴M represent all other possible outcomes of the first variable

Bayes Theorem is just a generalization of what we have done using the tree
diagram:
Variable 1: Truth à Outcome A: cancer
Variable 2: Mammogram à Outcome B: positive

𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 ∗ 𝑃(𝑐𝑎𝑛𝑐𝑒𝑟)


𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 =
𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 ∗ 𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 + 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟 ∗ 𝑃(𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟)
Bayes' Theorem
Bayes’ Theorem
𝑃 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝐴 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 1 𝑜𝑢𝑡𝑐𝑜𝑚𝑒 𝐵 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 2
𝑃 𝐵 𝐴J 𝑃(𝐴J)
=
𝑃 𝐵 𝐴J 𝑃 𝐴J + 𝑃 𝐵 𝐴[ 𝑃 𝐴[ + ⋯ + 𝑃 𝐵 𝐴M 𝑃(𝐴M )

where 𝐴[ , … , 𝐴M represent all other possible outcomes of the first variable

Breast Cancer Example


𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 ∗ 𝑃(𝑐𝑎𝑛𝑐𝑒𝑟)
𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 =
𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑐𝑎𝑛𝑐𝑒𝑟 ∗ 𝑃 𝑐𝑎𝑛𝑐𝑒𝑟 + 𝑃 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟 ∗ 𝑃(𝑛𝑜 𝑐𝑎𝑛𝑐𝑒𝑟)

The numerator identifies the joint probability of getting both 𝐴Jand B, aka
𝑃(𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 & 𝑐𝑎𝑛𝑐𝑒𝑟)

The denominator is the marginal probability of getting B, aka 𝑃(𝑐𝑎𝑛𝑐𝑒𝑟)


Bayes’ Theorem
To apply Bayes’ Theorem correctly, there are two preparatory steps:
(1) First identify the marginal probabilities of each possible outcome of
the first variable; 𝑃 𝐴J ,… , 𝑃(𝐴M )
(2) Then identify the probability of the outcome B, conditioned on each
possible scenario for the first variable; 𝑃 𝐵|𝐴J ,… , 𝑃(𝐵|𝐴M )

We always complete these steps when using tree diagram.


However, with the diagram these steps didn’t seem so complex

Step 1: Step 2:
Bayes’ Theorem

Drawing a tree diagram makes it easier to understand how two


variables are connected. Use Bayes’ Theorem only when there are
so many scenarios that drawing a tree diagram would be complex.

TIP: Only use Bayes Theorem when a Tree Diagram is


too difficult
Homework # 3:
Textbook exercises* 2.16, 2.18, 2.22, 2.26
Due at start of class on Tuesday 9/11

*Note: Textbook Exercises for Chapter 2 start on page 116


Reminder: Lab # 1

• Due Tuesday 9/11/2018 by midnight.


• Instructions and data on Blackboard now.
• Submit single word document titled “Lab1_firstName_lastName”.
Answers to all exercise questions should be in this document.

You might also like