Probability: Section 3

Section 3
Probability
In Sec 1.1.2, the words population,random and probable are introduced. Here the aim is to explain how to make probability into a quantitative concept; and how to work with probabilities. More terminology A trial is the random selection of an individual from a population in order to nd a value of some quantity of interest. For example someone selected at random is weighed as part of an obesity study. An outcome is one of the values that can result from a trial. For example 73.2 kg. An event is dened by a set of outcomes. For example those outcomes in the interval 65 < weight 70 kg . An instance of the event is said to occur when trials value is in the events dening set. For example, a trial gives a weight of 67.8 kg, and this is an instance of the event 65 < weight 70 kg . The probability of a particular outcome is the proportion of individuals in the population who are associated with that outcome. The probability of an event is the proportion of individuals in the population that have a value that is an instance of that event.
3.1
The essence of probability
Here several easily visualised situations are discussed. They are used as examples of the use of the terms outcome and probability.
3.1.1
Coloured balls
Suppose that a bag contains a selection of red balls, green balls, and blue balls, the balls being indistinguishable except for their colour. Imagine that the bag is shaken, someone pulls out a ball without looking in the bag. The balls colour is noted. Replace the ball, shake the bag to shufe the balls and repeat the blind selection. Do this many times. In this way a sequence of colours is produced; for example red blue red red green blue green green red blue blue red . . . . Suppose that in 1000 draws, there are 491 red balls, 307 green balls and 202 blue balls drawn. It would be agreed that the draw of a red ball was more probable than the draw of a green; and that the draw of a green was more probable than the draw of a blue. The population of interest is the population of these blind draws. The draws can be imagined as continuing for ever, so that the population is innite. In practice very large has to stand in for innite. The probability of drawing a red is dened to be the proportion of reds in the population. So the probability can be determined by patient experimentation; or it can be estimated by using the proportion of reds in the sample of 19
Research Methodology. Statistics (at November 28, 2008)
20
1000 draws, namely 491/100 = 0.491 . Similarly the probability of a green ball being drawn is estimated as 0.307, and of a blue ball being drawn as 0.202 . Of course these values 0.491, 0.307, and 0.202 are just approximations the the true values of the probabilities. Bag with known contents Usually the contents of the bag are known, or can be found out. Suppose that the bag has 5 red balls, 3 green balls and 2 blue balls; and that the colour is the only difference between the balls. Then in the long run the blind draw procedure would be expected to yield red for 5/10 of the draws, green for 3/10 of them, and blue for 2/10 of them. That is, such a bag could well give the experimental results 491, 307, and 202, in the thousand draws discussed above. So if each ball is equally likely to be chosen, the probability of red, green and blue can be deduced from the proportions in the population of balls, rather than from the population of draws. However this would not be the correct thing to do if the drawer were to ddle the draw by marking some of the balls so that they were not indistinguishable to the touch. If the drawer were to bias the outcomes, the estimate of probability from the draw history would be needed (an would show that some skulduggery is afoot!).
3.1.2
Voters survey
Voters in a political constituency vote Labour, Conservative, SNP, Liberal, or other. The population considered is all these voters. The outcomes are will vote Labour, will vote Conservative, will vote Liberal, will vote SNP, and will vote none of these. If everybody agreed to say how they would vote, it would be found that (for example) their intentions were 24% Labour, 17% Conservative, 15% SNP, 18% Liberal, 26% other. If a TV interview team select a random individual for a vox pop question about their voting intention, what is the probability that the individual is a Liberal voter? The probability would be 0.18 reecting the proportion of Liberal voters in the population. The population considered here is the whole electorate. However it is unusual to know these proportions for the whole electorate, because it takes a lot of effort to ask them all; and some will not reveal their intentions if asked. The proportions have to be estimated from a random sample. In such a sample of 500 voters, it might be found that the stated intentions were 126 Labour, 84 Conservative, 88 Liberal, 78 SNP, 124 other . If this was the only available information then the estimated probabilities would be 0.252 Labour, 0.168 Conservative, 0.176 Liberal, 0.156 SNP, 0.248 other. Rather than being the proportions in the population, these are the proportions in the sample. Used as probabilities, these values indicate that a person selected at random has a probability of about 0.176 of being a Liberal voter, a probability of about 0.256 of being a Labour voter, and so on.
3.1.3
Playing cards
Consider a pack of 52 standard playing cards. Consider it as a bivariate population in which each card has a suit (spade , heart , diamond , club ), and a denomination (ace (A), king (K), queen (Q), jack (J), 10, 9, 8, 7, 6. 5, 4, 3, 2). The population of interest is the population of random draws of a single card. Just as in the case of the balls in the bag, the probability of drawing (say) A , 10 , 5 , 2 , etc. need to be found by nding the proportion of occasions on which these cards are drawn. If the pack of cards is fully randomised by a thorough shufe, and a single card is drawn unseen from anywhere in the pack then the probabilities determined from the proportions in the pack should match the proportions in the population of draws. Here are three examples. (a) The probability that the card is the K is 1/52 .
21
The K is 1/52 of the pack of cards and should be represented one time in 52, in the population of draws. (b) The probability that the card is a spade is 13/52 = 0.25 . The proportion of spade cards in the pack is 13/52, and these should arise one time in four in the population of draws. (c) The probability that the card is a jack : 4/52 = 1/13. The proportion of the pack which are jacks is 4/52, and so these should be drawn one time in 13 in the population of draws. This argument will fail if the person doing the draw is an accomplished card sharp or conjurer. Such people have a set of tricks to persuade others that rigged draws are in fact random. This would become evident in the population of draws, which would then differ from the proportions in the pack.
3.1.4
National lottery
The population of interest is the population of draws. For simplicity only the rst ball to come out of the draw machine is considered just now. There is a history of draws that can be examined. If the machines work as intended then every ball labelled 1 to 49 comes out as the rst ball drawn equally often. If this is not the case, then every lottery participant wants to know about it so that she/he can improve their chances of winning. At the moment there is no evidence that the machines are biased. So the probability that (say) ball number 7 comes out rst in the draw is 1/49, the proportion of the number 7, in the population of balls, which ought to be the same as the proportion in the population of draws. The same probability applies to any other of the 49 numbers. The probability of the event the rst ball has one of my six selected numbers is 6/49, because this the proportion of my selected numbers in the population of balls. If the machine is fair then it will also be the proportion in the population of draws. The real lottery problem is less simple, and will be considered later.
3.1.5
Single die
Consider a six-faced die. The outcomes of a single throw are scores 1, 2, 3, 4, 5, and 6. The population of interest is the population of throws. The die is made as a regular polyhedron (in this case a cube) to try to ensure that each face is as likely as any other face to nish face upwards after a throw. If this is achieved and the throw is fair then the proportion of scores 1, 2, 3, 4, 5, and 6, in the population of throws should be equal and so have the value 1/6. The proportion of (say) 3s in the throws should be the same as the proportion of 3s in the population of face values. A complication If a 6 faced die were made but had its faces labelled 1, 2, 1, 3, 3, 4, then you would expect the outcomes 2 and 4 to occur with probability 1/6; but the outcomes 1 and 3 to occur with probability 2/6 (= 1/3). Such non-standard die labellings are used for the polyhedral dice in one of the methods stream exercises. Tossing a coin A coin can be regarded as a two faced die with head on one face, and tail on the other.
3.2
Rules for probability
A formal mathematical model which helps a lot in working with probabilities is the sample space. This is introduced, and later used to explain how to combine the probabilities of simpler events to obtain the probability of compound events.
22
3.2.1
Sample space and Probability space
The space of all possible outcomes of a trial is called the sample space. It can be visualised as shown in the following diagrams
Red Green Blue
1 Coloured balls
(5 red, 3 green, 2 blue) Probs: R 0.5, G 0.3, B 0.2 .
1 2 3 4 5 6 7
8 9 10
4 National lottery
Probs : 1/49 each.
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Labour Conservative SNP Liberal Other
2 Sample : 500 voters

Probs: Lab 0.252 Con 0.168 SNP 0.156 Lib 0.176 Other 0.248
40 41 42 43 44 45 46 47 48 49
123456
5a Single die throws
Probs: 1/6 each.
Head Tail Spades : Hearts : Clubs : A K Q J 10 9 8 7 6 5 4 3 2 A K Q J 10 9 8 7 6 5 4 3 2 A K Q J 10 9 8 7 6 5 4 3 2 Probs : 1/52 each.
5b
Single coin tosses
Probs: 1/2 each.
Diamonds: A K Q J 10 9 8 7 6 5 4 3 2
Pack of cards
Six sample spaces
In the above examples, there is a nitely many outcomes and a probability is associated with each outcome. These probabilities are greater than zero and less than (or equal to) one. This is because each probability represents the proportion of the population associated with the outcome. The sample space is to include all possible outcomes and so the proportions of the population associated with each outcome must add to one. In short the probabilities must add to one. The sample space together with the dened probabilities forms a Probability space. The probabilities may all be different; but some or all can be the same, depending on the situation being considered. A probability will only be equal to one when there is only one outcome possible; for example head for the toss of a double headed coin. A probability zero means that the outcome does not occur. Such outcomes are usually not included in the sample space. For example the probability of a standard die stuck on edge has probability zero. The probability space depends on the matter under consideration. For example if the question is the probability that a card selected at random from a pack of cards is a club, the sample space in diagram 3 is too elaborate. The space need only consist of the four suits spades, hearts, clubs and diamonds each with probability 1/4. The above considered examples fall into a class of Equi-probable discrete spaces. The number of possible outcomes is nite, say n, and all outcomes have the same probability, hence 1/n. For instance, tossing two distinguishable coins, say 1 penny and 2 pence coins, has associated sample space { HH, HT, TH, TT }, where H means head, T means tail and the rst letter corresponds to, for instance, the one-penny coin. If the coins are symmetric, all outcomes have the same probability 1/4. This is often written as a table representing the distribution of the random variable: Outcomes: Probabilities: HH 1/4 HT 1/4 TH 1/4 TT 1/4
23
But if the coins are indistinguishable, the appropriate sample space is { HH, HT, TT } with unordered pairs and the probability space is Outcomes: Probabilities: HH 1/4 HT 1/2 TT 1/4
Suggesting equi-probable model for symmetrical coins with probabilities 1/3 would not correspond to the reality. Note, there is nothing wrong with the probability space itself, it is just not an appropriate theoretical model for symmetrical coins. Similarly, is there anything wrong with the following probability space for one-coin toss? Outcomes: Probabilities: H 0.4 T 0.6
No! For instance, the coin could be bent. It is a different question whether this probability space corresponds to the toss of your particular coin, and methods to assess this are addressed later in these notes.
3.2.2
Adding the probabilities of outcomes
The probability of an event can be found by adding the probabilities of its distinct dening outcomes. This is because the proportion of individuals with an outcome in the events set is the sum of the proportions of the individuals associated with each separate outcome.
Event : Red or green Prob: 0.5+0.3 for one ball extracted.
Red Green Blue
Spades : Hearts : Clubs :
A K Q J 10 9 8 7 6 5 4 3 2 A K Q J 10 9 8 7 6 5 4 3 2 A K Q J 10 9 8 7 6 5 4 3 2
Event : Queen or heart Prob: (13+3)/52 for card drawn.
Diamonds: A K Q J 10 9 8 7 6 5 4 3 2
Labour SNP Conservative Liberal Other
Event : Labour or SNP Prob : 0.252+0.156 for one voter selected.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Event : Mynumbers (11, 21, 30, 31, 40, 41) Prob : 6/49 for first out.
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
Because the sample space is to include all possible outcomes, the sum of the probabilities of these outcomes must add up to 1. This is because the sum of the population proportions must be the whole population. Examples With the example of (3.1.1) the probability of a ball being red or green is the probability of being red plus the probability of being green. That is 0.5 + 0.3 or 0.8. From the proportion point of view, there are 8 balls of the 10, that match the event a proportion of 0.8 . With the example of (3.1.2) the sample would give us an estimate of the probability of being Labour or SNP as the sum of the estimates 0.252 + 0.156 or 0.408 . From the proportion point of view, 126 + 78 of the sampled 500 voters expressed support for Labour or SNP; and this proportion is 204/500 = 0.408 . With the example of (3.1.3) The probability of queen or a heart is the sum of the probabilities of 13 hearts plus the probabilities of the queens of spades, diamonds and clubs. This is 13/52 + 3/52 = 16/52 = 4/13 . Obviously 16 cards out of 52 match the description queen or heart.
24
With the example of (3.1.4) it has already been seen that the probability of the rst ball being one of my six numbers is 6/49; and this is 1/49+1/49+1/49+1/49+1/49+1/49 . With the example of (3.1.5) the probability of the event score is even is the sum of probabilities of the outcomes two, four, and six. That is 1/6 + 1/6 + 1/6 = 1/2. This makes sense because the probability of score being odd needs to be the same.
1 3 5 2 4 6 Event : Even die score. Prob: 1/6+1/6+1/6.
3.2.3
Compound events using or
Consider the playing card event that in the case of a single card drawn at random, the card is a face card or a spade. The face cards are king, queen, and jack. This event is made up using two simpler events card is a face card, and card is a spade. The probability of a face card is 12/52, because there are three face cards in each of four suits. The proportion of face cards is twelve out of fty-two. The probability of a spade is 13/52, because there are thirteen cards in the spade suit. The compound event card is a face card or a spade, happens if the outcome is a spade or a face card heart, a face card diamond, or a face card club. These are 13 + 3 + 3 + 3 = 22 outcomes. The probability of the compound event is therefore 22/52 or 11/26. This is not the sum of 12/52 + 13/52 = 25/52. Can you see why?
A K Q J 10 9 8 7 6 5 4 3 2 A K Q J 10 9 8 7 6 5 4 3 2 A K Q J 10 9 8 7 6 5 4 3 2
A K Q J 10 9 8 7 6 5 4 3 2 A K Q J 10 9 8 7 6 5 4 3 2 A K Q J 10 9 8 7 6 5 4 3 2
Diamonds: A K Q J 10 9 8 7 6 5 4 3 2
Diamonds: A K Q J 10 8 6 4 2 9 7 5 3
Face card or a spade : 22 cases.
Face card or even diamond : 17 cases.
The reason is that the two events face card and spade are not mutually exclusive. The card can be a face card and a spade. It can be the king, queen, or jack of spades. When these probabilities are added, these three outcomes are counted twice; once as a face card and once as a spade. So in general the probabilities of events cannot be added, but the probabilities of outcomes can. The difference is that outcomes are mutually exclusive and events may not be. As an example of mutually exclusive events consider the card is a face card and the card is an even number of diamonds. These give the probabilities face card even diamond face card or even diamond Or in general In general, if E1 is an event represented by the set of outcomes S1 , and if E2 is an event represented by a set of outcomes S2 , then E1 or E2 is the event represented by any outcome that is S1 or in S2 (or both). If you are familiar with set theory you will see that, E1 or E2 is represented by the union (S1 S2 ) of the two sets. 12/52 5/52 17/52.
25
3.2.4
Compound events using and
Consider the event, in relation to a pack of playing cards, of being a face card and being a spade. As said before, this is being the K , the Q , or the J . Suppose E1 is the event being a face card represented by S1 , the set of all the face cards. Suppose E2 is the event being a spade represented by S1 , the set of all spades. Then the event E1 and E2 is represented by the set of all outcomes that are both in S1 and in S2 . These are the three face card spades as already described. And in general If you are familiar with set theory you will see that, E1 and E2 is represented by the intersection (S1 S2 ) of the two sets. If two events are mutually exclusive then the sets that represent them have no outcomes in common. In set theory this means that the intersection of the two sets is empty (is the empty set). The empty set represents an event with probability zero; and so it is called the impossible event.
3.2.5
General formula for or compounded events
The the general rule for two events, E1 and E2 is P (E1 or E2 ) = P (E1 ) + P (E2 ) P (E1 and E2 ) Example 1 In the card case E1 = card is face card and E2 = card is spade, P (E1 ) = 12/52 , P (E2 ) = 13/52, and P (E1 and E2 ) = 3/52. Therefore P (E1 or E2 ) = 12/52 + 13/52 3/52 = 22/52 = 11/26 . The probability 3/52 of being a face card of spades is counted twice in P (E1 ) + P (E2 ). The subtraction removes one of the double counts. Example 2 Suppose for the throw of a single die that E1 = score is even, and E2 = score is a multiple of 3. Then E1 is represented by the set {2, 4, 6} and E2 is represented by the set {3, 6}. E1 or E2 is represented by the set {2, 3, 4, 6} ; and E1 and E2 is represented by {6}. Therefore P (E1 ) = 1/6 + 1/6 + 1/6 = 1/2, P (E2 ) = 2/6 = 1/3, P (E1 and E2 ) = 1/6, and so P (E1 or E2 ) = 1/2 + 1/3 1/6 = 2/3 as it should be for the probability (4/6) of the event represented by {2, 3, 4, 6}. (OR)
3.2.6
Particular cases
When E1 and E2 are mutually exclusive then P (E1 and E2 ) = 0, and so P (E1 or E2 ) = P (E1 ) + P (E2 ). The general rule (OR) always works. This simple addition of probabilities is only possible for mutually exclusive events. When one event E1 , includes another E2 , the set dening E2 is a subset of the set dening E1 . In such a case P (E1 or E2 ) = P (E1 ) and P (E1 and E2 ) = P (E2 ). For example E1 = card is spade, E2 = card is a face card of spades. P (E1 or E2 ) = P (card is a spade) = P (E1 ) . P (E1 and E2 ) - P (card is a face card of spades) = P (E2 ) .
26
3.2.7
Complementary events
The event not E is represented by all the outcomes that are not instances of E . It is called the event complementary to E and will sometimes be written E c to keep notation compact. So the coloured ball event not green is the event is red or blue. The probability of green is 0.3, and the probability of mutually exclusive events red or blue is 0.5 +0.2 = 0.7 = 1 - 0.3. This is a case of the general rule P ( not E ) = 1 P (E ) . (NOT)
The theoretical argument for this is that P (E ) is the proportion of the population with a value that is an instance of E , and 1 P (E ) is the proportion that is not an instance if E . NB. You may have noticed that the rules (AND), (OR) and (NOT) the probability obeys are exactly the same rules as for mass, area or electrical charge. Indeed, all these are examples of one mathematical abstraction called a measure. The only difference is that the probability is always between 0 and 1 and the whole sample space weighs 1. Think of probability of an event as a total mass of its outcomes this is an easy way to remember the probability rules.
3.3
Cross-tabulated data
Bivariate data arise when each member of the population of interest has two measured characteristics. In situations with more than two such characteristics the data are sometimes treated as bivariate by ignoring the extra characteristics. It is often convenient to display bivariate data in a table, with values of one characteristic kept the same for each row, and the other kept constant for each column.
3.3.1
Finding probabilities
A random sample of 800 students was taken and classied by favourite type of music (rows) and faculty (columns). The results are shown in the frequency table rounded to the nearest multiple of ten for ease of discussion. It is important to such data that all students have precisely one faculty that they belong to, and that they choose only one type of music as their preferred type. In other words the column and row classications are mutually exclusive and exhaustive. (Exhaustive means that every student can be allocated a row and a column). Faculty Edu Eng 40 30 10 30 10 30 20 40 50 20 0 20 130 170
Preferred music Heavy metal Rock Country Folk Classical Other Column total
Arts 20 40 50 40 20 30 200
Bus 30 20 10 20 30 10 120
Sci 10 20 30 30 50 40 180
Row total 130 120 130 150 170 100 800
By examining this sample it is possible to estimate various probabilities for the whole population. Each entry in the main body of the table (excluding the row of column-totals and the column of row-totals) is the number in the sample that prefer the type of music associated with that row, and are in the faculty associated with that column. Examples
27
1. What is the probability that a student selected at random is in the science faculty and prefers folk music? Estimated P (student is a science faculty folk enthusiast) = Proportion of sample occupied by such students = 30/800 = 3/80 . 2. What is the probability that a student selected at random is a classical music lover and in the Education faculty? Estimated P = 50/800 = 1/16 . 3. What is the probability that a student selected at random prefers country music? Estimated P = 130/800 = 13/80 , because 130 out of 800 in our sample show country music preference. 4. Estimate the probability that a randomly selected student is in the arts faculty. P = 200/800 = 1/4 . The row totals and column totals are called marginal totals. It is possible to divide all the table entries by 800 so that the table entries are all probabilities. Then the marginal totals are referred to a marginal probabilities: Faculty Edu 0.0500 0.0125 0.0125 0.0250 0.0625 0.0000 0.1625
Preferred music Heavy metal Rock Country Folk Classical Other Column total
Arts 0.0250 0.0500 0.0625 0.0500 0.0250 0.0375 0.2500
Bus 0.0375 0.0250 0.0125 0.0250 0.0375 0.0125 0.1500
Eng 0.0375 0.0375 0.0375 0.0500 0.0250 0.0250 0.2125
Sci 0.0125 0.0250 0.0375 0.0375 0.0625 0.0500 0.2250
Row total 0.1625 0.1500 0.1625 0.1875 0.2125 0.1250 1.0000
Examples (14) above are answered by the entries, 0.0375, 0.0625, 0.1625, and 0.2500 at the appropriate places in this table. The tables also help with the or question. Example : What is the probability that a randomly selected student is in the education faculty or the engineering faculty? In the frequency table there are 130 + 170 = 300 such students. So the probability is 300/800 = 3/8. Alternatively in the second table add the probability of the two mutually exclusive events to get 0.1625 + 0.2125 = 0.375 (= 3/8). Example : What is the probability that a randomly selected student is in the arts faculty or likes heavy metal music. In the frequency table there are 200 + 130 - 20 = 310 such students (20 subtracted to prevent double counting). So the probability is 310/800 = 31/80. Alternatively in the probability table, this time using the general formula (OR) of (3.2.5) because the two events are not mutually exclusive, the calculation is 0.25 + 0.1625 - 0.025 = 0.3875 (= 31/80).
3.4
Conditional probability
With bivariate data there is another type of question that can be asked. Here are two examples. 1. What is the probability that a randomly selected business student prefers classical music? 2. What is the probability that a rock music fan is an education student?
28
(1a) The rst question is about business students. Ignore the data about other students and examine the musical preference only of the 120 business students. Those preferring classical music are 30 of these 120; and so P ( classical | business ) = 30/120 = 1/4 = 0.25 . The notation is usually read as the probability that the student prefers classical music given that the student is a business student. That is | is read as given that. (2a) The second question is about rock music fans. There are 120 of these of whom 10 are in the education faculty. Therefore P ( education | rock ) = 10/120 = 1/12 = 0.0833 (to 3 signicant gures). This is read as the probability that the student is in the education faculty, given that the student is a rock music fan. Using the probability table Both questions can also be answered from the probability table. (1b) In the rst question the calculation 30/120 = (30/800) / (120/800) = 0.0375 / 0.15 = 0.25 where 0.0375 = P ( classical and business ) , and 0.15 = P ( business ) from the marginal business total. Summary : P ( classical | business ) = P ( classical and business ) / P ( business ) . (2b) In the second question the calculation 10/120 = (10/800) / (120/800) = 0.0125 / 0.15 = 0.0833 where 0.0125 = P ( education and rock ), and 0.15 = P ( rock ) from the marginal rock total. Summary : P ( education | rock ) = P ( education and rock ) / P ( rock ) .
3.4.1
Examples with cards
Sometimes it is more trouble to make a cross-tabulation than just to deal with the question raised in terms of numbers. Here are some examples for a standard pack of playing cards. 1. P (card is a heart | it is red) . Card is red = card is one of 13 hearts or 13 diamonds. P (card is a heart | it is red) = 13/26 = 1/2 P (card is a heart) = 13/52 = 1/4. 2. P (card is a face card | it is black) . Card is black = card is one of 13 spades or 13 clubs. So P (card is a face card | it is black) = 6/26 = 3/13 . P (card is a face card) = 12/52 = 3/13 . In this case, knowing that the card is black makes no difference. 3. P (card is black | card is a face card) . Card is a face card = card is one of 12 face cards P (card is black | card is a face card) = 6/12 = 1/2 . P (card is black) = 26/52 = 1/2. In this case knowing that the card is a face card makes no difference. compared with
29
4. P (card is even | it is numeric) . Card is numeric = card is one of 36 (2-10 of any suit). So P (card is even | it is numeric) = 20/36 =5/9 P (card is even) = 20/52 = 5/13 . compared with
3.4.2
Equivalent denition of conditional probability
The two summary statements in (3.4) are both instances of the general rule: P (E1 |E2 ) = P (E1 and E2 )/P (E2 ) if P (E2 ) > 0 . This is taken as a formal denition of conditional probability. The explanation of this rule is that (COND)
P (E1 |E2 )
= (number of E2 that are also E1 ) / (number of E2 ) = (proportion events that are E1 and E2 )/(proportion of events that are E2 ) = P (E1 andE2 )/P (E2 )
30
Examples 1. P (die score is 2 | die score is even) . P (die score is 2 and die score is even)=P (die score is 2) = 1/6 . P (die score is even) = 3/6. P (die score is 2 | die score is even) = (1/6) / (3/6) = 1/3 . This is different from P (die score is 2), which is 1/6. In the working we had to recognise that the event die score is 2 and die score is even is the same event as die score is 2. Being told that the score is even adds nothing to the statement that the score is 2. 2. P (die score is 6 | it is a multiple of 3) . P (die score is 6 and it is a multiple of 3) = P (die score is 6) = 1/6 . P (it is a multiple of 3) = 2/6 . P (die score is 6 | it is a multiple of 3) = (1/6) / (2/6) = 1/2 . This is different from P (die score is 6) = 1/6 . 3. P (card is a heart | card is red) . P (card is a heart and card is red) = 13/52 . P (card is red) = 26/52 . P (card is a heart | card is red) = (13/52) / (26/52) = 1/2 . 4. P (card is a face card | card is black) . P (card is a face card and card is black) = 6/52 . P (card is black) = 26/52 . P (card is a face card | card is black) = (6/52) / (26/52) = 3/13 .
3.4.3
Connection with and
From the rule (COND), multiplying through by P (E2 ) gives
P (E1 and E2 ) = P (E1 |E2 )P (E2 ) Another form of (COND) is P (E2 |E1 ) = P (E2 and E1 )/P (E1 ) and this gives an alternative form of (AND)
(AND)
P (E2 and E1 ) = P (E2 |E1 )P (E1 ) .
(AND )
Now E2 and E1 can just as well be written E1 and E2 . The two compound events the same. Examples 1. Suppose that the jack of hearts is missing from a pack of cards. Then P (card is face card and card is a heart)= P (card is a face card | card is a heart)P(card is a heart)= (2/13) (13/51) = 2/51.
31
2. In the national lottery draw let E1 be the event that the rst ball to come out is number 3. Let E2 be the event that the the second ball to come out is number 22. P (E1 and E2 ) = P (E2 |E1 )P (E1 ) = (1/48)(1/49) because P (E1 ) is the probability of getting one particular ball from a population of 49, and P (E2 |E1 ) is the probability of getting a particular ball from the remaining 48 (given that the rst ball has been removed and the required ball is still in the machine). 3. Selection such as that just discussed is called selection without replacement. If the numbers chosen by the lottery participant were 3, 22, 29, 31, 41, 42, the probability of the corresponding balls coming out in that precise order is (by extension of the previous argument) (1/49) (1/48) (1/47) (1/46) (1/45) (1/44) = 1/10068347520 . However there are other orders that the balls can come out; and any of these will also mean that the participant is a winner. In fact there are 6! = 720 different orders that the participants numbers can emerge. This means that the chance of winning with 3, 22, 29, 31, 41, 42 is 720/10068347520 = 1/1398316 .
3.4.4
Independent events
If P (E2 |E1 ) = P (E2 ) , then knowing that a value is an instance of E1 makes no difference to the the probability of its being an instance of E2 . In this case the events E1 and E2 are said to be statistically independent. Examples 1. P (card is a face card | card is black) = P (card is a face card); P (card is black | card is a face card)=P (card is black); (see 3.4.1 item 2 above).
(see 3.4.1 item 3 above).
Either of these shows that being black and being a face card are statistically independent. 2. If E1 is tossed 10p piece is a head and E2 is tossed 20p piece is a tail, then P (10p piece is a head | 20p piece is a tail) = P (10p piece is a head) . The tosses of the two coins are genuinely independent; the 10p coin is unaffected by the toss of the 20p coin. Events that are genuinely independent are also statistically independent. Special case The general rule for multiplying probabilities is (AND) or (AND ). These forms are always true. However, they both reduce to the same simpler form when E1 and E2 are statistically independent: P (E1 and E2 ) = P (E1 )P (E2 ) . Converse If P (E1 and E2 ) = P (E1 )P (E2 ) then E1 and E2 are statistically independent . This is so because P (E1 and E2 ) = P (E1 |E2 )P (E2 ) = P (E1 )P (E2 ). Provided that P (E2 ) = 0 this implies that P (E1 |E2 ) = P (E1 ). This establishes the statistical independence.
3.4.5
Two dice
When several dice are rolled the score on one die is quite independent of the score on the others. The dice do not communicate with one another. As said earlier, true independence will produce statistical independence. Suppose there are two dice. Let E1 be the event that the rst die is a six, and let E2 be the event that the second die is a six. The probability of a double six is
32
P (E1 and E2 ) = P (E1 )P (E2 ) = (1/6)(1/6) = 1/36 . A similar calculation will give P (double one) as also 1/36. Consider the probability of a total score of 4, with two dice. Let E1 be the event that the rst die is one; let E1 be event that rst die is two; and let E1 be the event that the rst die is three. Let E2 , E2 , E2 be equivalent events for the second die. Then
P (total score 4) = P {((E1 and E2 ) or (E1 and E2 ) or (E 1 and E 2)} = P (E1 and E2 ) + P (E1 and E2 ) + P (E1 and P (E2 ) = P (E1 )P (E2 ) + P (E1 )P (E2 ) + P (E1 )(P (E2 ) = (1/6)(1/6) + (1/6)(1/6) + (1/6)(1/6) = 3/36 = 1/12
The probabilities of the separate events linked by or have been added because they are mutually exclusive. The probabilities of the pairs linked by and have been multiplied because they are independent. This mode of argument allows you to work out the probability of any score found from two dice. It extends to more dice; but the more there are the more complicated it gets.
3.5
Bayes theorem
This subject is alternatively known a Bayes rule, Bayes law, and Bayes formula; but some authors make ne distinctions between the terms. Consider the diagnosis problem where birds may have a very serious form of inuenza. There is a test that can be carried out to nd out if the illness of the bird is in fact the serious form. Tests of this kind are often very good, but rarely perfect. So some tests will show a positive for the serious type when the bird is not suffering from it. These are called false positives. However, for some birds that do have the disease, the test will show that they are free of it. These are called false negatives. Obviously a good test will show a very low proportion of both false positives and false negatives. The data that can be accumulated is of the form given that the bird has the disease, the proportion of true positives is and the proportion of false negatives is 1 . This data can be obtained in circumstances where more tests are available to make it really sure that the birds did have the serious form. Data can also be accumulated from birds that denitely did not have the disease, but in which the test was applied. It will be found that there is a proportion of true negatives and a proportion 1 of false negatives. Now the real situation is to use the test on a bird to conclude whether the bird has the disease or not. The test will either be positive or negative. If it is positive it is not certain that the bird has the disease, because sometimes false positives occur. Similarly if it is negative, it is not certain that the bird does not have the disease, because there are false negatives. The problem is to quantify the probability that the bird has the disease, given a positive test, and the probability that the bird is free of the disease, given a negative test. This turning round of the probability question from the probability of the test result, given the disease, to the probability of the disease, given the test result is the subject matter of Bayes theorem.
3.5.1
Tree diagram
It is convenient to draw a tree diagram to illustrate this kind of situation. The notation is HD = has the disease, ND = No disease, TP = test positive, TN = test negative. Suppose that the proportion of wildlife who have the disease is and the proportion who do not is 1 .

TP =P(HD and TP)
33
HD
1 TN (1) =P(HD and TN)
TP 1 ND 1
(1)(1) =P(ND and TP)
TN (1) =P(ND and TN)
Here P (HD)= and P (ND)= 1 . P (TP|HD) = and P (TN|HD) = 1 . P (TP|ND) = 1 and P (TN|ND) = . Now P (HD and TP) = P (HD) P (TP|HD) = . The probabilities on the two top branches are multiplied to obtain the probability of the rst event (HD) and the second event (TP). A similar process gives P (HD and TN) = P (HD) P (TN|HD) = (1 ) . P (ND and TP) = P (ND) P (TP|ND) = (1 )(1 ) . P (ND and TN) = P (ND) P (TN|ND) = (1 ) . That is, you multiply the probabilities on the branches to obtain the probability of the rst and second event. To nd the probability that the bird has the disease, given a positive test is P (HD|TP) = P (HD and TP) / P (TP) . The numerator is already known (). However the denominator must be worked out from P (TP) = P (TP and HD) + P (TP and ND) because TP occurs in only two (mutually exclusive cases); one with HD, and the other with HN. Finally therefore
P (HD|TP) =
+ (1 )(1 )
In a similar manner the other conditional probabilities can be calculated
P (ND|TP) =
(1 )(1 ) + (1 )(1 )
34
P (ND|TN) =
(1 ) (1 ) + (1 )
P (HD|TN) =
(1 ) (1 ) + (1 )
The probability of a false positive is P (ND|TP), and the probability of a false negative is P (HD|TN). The working has all been done with general symbols to bring out the pattern of the operations that are needed. In particular cases it is best to mark up a tree diagram and then to extract the sub-probabilities that are needed.
3.5.2
Example
The proportion of birds with the serious form of inuenza is 0.05 . The test gives a correct positive in a proportion 0.94 of cases that have the serious form, and a correct negative in 0.93 of cases that do not have the serious form. Calculate the probability that a randomly chosen bird which tests negative actually does have the serious form of the disease. You will recognise that false negatives are more serious than false positives if the objective is to eliminate the disease. Make the tree diagram, but put the relevant probabilities on it. Some probabilities are not needed if only one question is to be answered, and these have been omitted.
Example TP
HD 0.05 0.06 TN 0.05 x 0.06 = 0.003
TP 0.95 ND
0.93 TN 0.95 x 0.93 = 0.8835
The probability of having the serious form and testing negative is 0.003; and the probability of not having the serious form and testing negative is 0.8835 . Therefore the probability of having the disease given a negative test is P (HD|TN) = P (HD and TN)/P (TN) = 0.003/(0.003 + 0.8835) = 0.00338, and this is quite small.
3.5.3
Another example
Two types of fault occur in manufacturing an electronic device: CF = component failure and Bc = broken circuit. CF occurs in 5% of devices manufactured and Bc in 7%.
35
Because of redundancy in the design the device works in 90% of cases of CF and in 85% of cases of Bc . It also works in 95% of cases with neither type of fault. If you have a working device, what is the probability that it has a broken circuit? Sketch a tree diagram in a similar manner to the preceding example. Mark the essential information on it. OK means neither CF nor Bc.
0.90 CF 0.05 W 0.05 x 0.90 = 0.045
F W 0.07 x 0.85 = 0.0595
0.07
0.85 Bc
F 0.88 OK F W 0.88 x 0.95 = 0.8360
0.95
P (Bc|W) = P (Bc and W)/P (W) = 0.0595/(0.045 + 0.0595 + 0.8360) = 0.0633 . So 6.33% of working devices have a broken circuit, and this may mean that they are going to give trouble sooner rather than later.

Probability: Section 3

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability: Section 3

Uploaded by

Copyright:

Available Formats

Section 3

The essence of probability

Research Methodology. Statistics (at November 28, 2008)

Research Methodology. Statistics (at November 28, 2008)

Rules for probability

Research Methodology. Statistics (at November 28, 2008)

Sample space and Probability space

Labour Conservative SNP Liberal Other

2 Sample : 500 voters

5a Single die throws

Probs: 1/6 each.

Head Tail Spades : Hearts : Clubs : A K Q J 10 9 8 7 6 5 4 3 2 A K Q J 10 9 8 7 6 5 4 3 2 A K Q J 10 9 8 7 6 5 4 3 2 Probs : 1/52 each.

Single coin tosses

Probs: 1/2 each.

Six sample spaces

Research Methodology. Statistics (at November 28, 2008)

Adding the probabilities of outcomes

Red Green Blue

Spades : Hearts : Clubs :

Event : Queen or heart Prob: (13+3)/52 for card drawn.

Labour SNP Conservative Liberal Other

Event : Labour or SNP Prob : 0.252+0.156 for one voter selected.

Research Methodology. Statistics (at November 28, 2008)

Compound events using or

Spades : Hearts : Clubs :

Spades : Hearts : Clubs :

Face card or a spade : 22 cases.

Face card or even diamond : 17 cases.

Research Methodology. Statistics (at November 28, 2008)

Compound events using and

General formula for or compounded events

Research Methodology. Statistics (at November 28, 2008)

Row total 130 120 130 150 170 100 800

Research Methodology. Statistics (at November 28, 2008)

Arts 0.0250 0.0500 0.0625 0.0500 0.0250 0.0375 0.2500

Bus 0.0375 0.0250 0.0125 0.0250 0.0375 0.0125 0.1500

Eng 0.0375 0.0375 0.0375 0.0500 0.0250 0.0250 0.2125

Sci 0.0125 0.0250 0.0375 0.0375 0.0625 0.0500 0.2250

Row total 0.1625 0.1500 0.1625 0.1875 0.2125 0.1250 1.0000

Research Methodology. Statistics (at November 28, 2008)

Examples with cards

Research Methodology. Statistics (at November 28, 2008)

Equivalent denition of conditional probability

Research Methodology. Statistics (at November 28, 2008)

Connection with and

From the rule (COND), multiplying through by P (E2 ) gives

P (E2 and E1 ) = P (E2 |E1 )P (E1 ) .

Research Methodology. Statistics (at November 28, 2008)

(see 3.4.1 item 3 above).

Research Methodology. Statistics (at November 28, 2008)

Research Methodology. Statistics (at November 28, 2008)

1 TN (1) =P(HD and TN)

(1)(1) =P(ND and TP)

TN (1) =P(ND and TN)

In a similar manner the other conditional probabilities can be calculated

Research Methodology. Statistics (at November 28, 2008)

HD 0.05 0.06 TN 0.05 x 0.06 = 0.003

0.93 TN 0.95 x 0.93 = 0.8835

Research Methodology. Statistics (at November 28, 2008)

F W 0.07 x 0.85 = 0.0595

F 0.88 OK F W 0.88 x 0.95 = 0.8360

You might also like