Professional Documents
Culture Documents
MATH1231 Algebra
1 / 76
E.g. c = S.
2 / 76
A, B are disjoint if A B = .
3 / 76
Partition
Definition
The sets A1 , A2 , . . . , An is said to partition a set B when
A1 , A2 , . . . , An are pairwise disjoint, i.e. Ai Aj = whenever i 6= j,
and
A1 A2 An = B.
Proposition
Suppose A1 , . . . , An partition S & B S then A1 B, . . . , An B
partition B.
Daniel Chan (UNSW)
4 / 76
A B = B A.
A (B C ) = (A B) C .
(Distributive laws) A (B C ) = (A B) (A C ).
A (B C ) = (A B) (A C ).
(De Morgans Laws) (A B)c = Ac B c ,
(A B)c = Ac B c .
Why?
5 / 76
Example
If A = { : is a lower case letter in the English alphabet}, then |A| = 26.
Obviously, if finite sets A and B are disjoint then
|A B| = |A| + |B|.
If not, we have
6 / 76
Solution
Let T = {3, 6, 9, . . . , 99}, F = {5, 10, 15, . . . , 100}.
7 / 76
Analysing 3 subsets
Example
In a class of 40 students: 2 have bulbasaur, charmander & squirtle, 7 have
at least bulbasaur & charmander, 6 have at least charmander & squirtle, 5
have at least bulbasaur & squirtle, 17 have at least bulbasaur, 19 have at
least charmander, & 14 have at least squirtle. How many have none of
these pokemon?
Solution
8 / 76
Proposition
Let A, B, and C be finite sets. Then
|A B C | = |A| + |B| + |C | |A B| |A C |
|B C | + |A B C |.
In eg of previous slide, let B, C , S be the sets of students who have
bulbasaur, charmandeer & squirtle resp. Then
|B C S| = |B| + |C | + |S| |B C | |C S| |B S| + |B C S|
= 17 + 19 + 14 7 6 5 + 2 = 34
Hence |(B C S)c | = 40 34 = 6.
9 / 76
Probability
In probability theory, we consider performing experiments e.g. rolling a
die, polling voters etc and then observe or measure the outcomes.
Definition
The set of all possible outcomes of a given experiment is called the
sample space for that experiment.
Example
Find the sample space for the experiment of tossing a coin three times.
Solution
The sample space is the set
{HHH, HHT , HTH, HTT , THH, THT , TTH, TTT }.
9.2 Probability
10 / 76
Events
Definition (Event)
An event is a subset of the sample space i.e. a set of possible outcomes.
Eg The set {HHT , HTH, THH} is the event exactly two heads in 3
tosses. {HHH} is event all heads in 3 tosses.
Definition
Two events A and B are said to be mutually exclusive if A B = .
Eg 2 events in previous example are mutually exclusive.
9.2 Probability
11 / 76
Definition of Probability
Definition
Suppose S is a sample space, then a probability on S is a real valued
function P defined on the set of all events (subsets of S) such that
a) 0 P(A) for all A S;
b) P() = 0;
c) P(S) = 1; and
d) if A, B are mutually exclusive events then
P(A B) = P(A) + P(B).
E.g. S = {T , H}. Following define a probabilities on S
Fair coin: P() = 0, P({T }) = 12 , P({H}) = 21 , P({T , H}) = 1
Biased coin: P() = 0, P({T }) = 13 , P({H}) = 32 , P({T , H}) = 1
Daniel Chan (UNSW)
9.2 Probability
12 / 76
Proof of (2).
9.2 Probability
13 / 76
Example
Suppose A and B are two mutually exclusive events in which
P(A) = 0.48, P(B) = 0.36. Write down the values of
a) P(A B), b) P(A B), c) P(Ac ), d) P(A B c ).
Solution
9.2 Probability
14 / 76
Theorem
Let P be a probability on a finite sample space S.
Suppose all outcomes of S are equally likely, i.e. P({a}) is constant
|A|
1
Then P({a}) = |S|
& P(A) =
.
|S|
The probability function in the thm is called the discrete uniform
probability.
9.2 Probability
15 / 76
Solution
If n > 365, by the pigeonhole principle, there will surely be at least two
people born on the same day i.e. probability = 1.
Assume n 6 365. Let Y be the set of the 365 days of the year.
Consider sample space
S = {(b1 , . . . , bn ) : b1 , . . . , bn Y } .
Let A = event at least two of the n people share the same birthday.
Assume that it is equally likely for a person to be born on any date of a
year, outcomes of the sample space are equally likely.
Daniel Chan (UNSW)
9.2 Probability
16 / 76
Solution (Continued)
Kindergarten counting arguments give
|S| = 365n
Therefore,
P(A) = 1 P(Ac ) = 1
365 P
365n
|Ac |
365 364 (365 n + 1)
=1
.
|S|
365n
.
4
0.016
16
0.284
23
0.507
9.2 Probability
32
0.753
40
0.891
56
0.988
17 / 76
1st Child
B
G
B
BB
GB
G
BG
GG
1
Probability of two boys = .
4
Now, suppose we are told that there is at least one boy.
The outcome two girls must be excluded from the sample space.
New sample space = {BB, BG , GB}.
The probability of two boys, given there is at least one boy will be
Daniel Chan (UNSW)
9.2 Probability
1
.
3
18 / 76
P(A B)
P(B)
provided
P(B) 6= 0.
9.2 Probability
19 / 76
Solution
9.2 Probability
20 / 76
Solution
9.2 Probability
21 / 76
Bayes Rule
Bayes Rule computes conditional probabilities P(Ai |B) from P(B|Ai ) as
asked for in the previous example.
Theorem
If A1 , . . . , An partition the sample space S, and B is any event in S then
(Total probability rule)
P(B) =
n
X
P(B Ai ) =
i=1
n
X
P(B|Ai ) P(Ai ),
and
i=1
(Bayes rule)
P(Ak |B) =
P(B|Ak ) P(Ak )
n
X
P(B|Ai ) P(Ai )
i=1
9.2 Probability
22 / 76
9.2 Probability
23 / 76
Solution
9.2 Probability
24 / 76
Statistical Independence
Informally, two events A and B are independent if the occurrence of one
does not affect the probability of the other. i.e.
P(A|B) = P(A)
Definition
Two events A and B are statistically independent (or independent for
short) if P(A B) = P(A) P(B).
E.g. Below are the probabilities for being left-handed (L) or right-handed
(R) and passing (P) or failing (F) MATH1231. Are L, P independent?
P
F
L .09 .01
R .71 .19
9.2 Probability
25 / 76
for any i1 , . . . , ik .
Example
The following system consists of three independent components. It fails
when either A fails or both B and C fail.
The probabilities that components A, B, and C fail in a year are 0.1, 0.3,
and 0.35, respectively. Find the probability that the system fails in a year.
Daniel Chan (UNSW)
9.2 Probability
26 / 76
Solution (Continued)
9.2 Probability
27 / 76
Random Variables
Definition
A random variable is a real-valued function defined on a sample space.
E.g. Suppose your experiment is randomly pick a student from this
lecture theatre (outcome is a student). H = height of student is a
random variable.
E.g. Consider a game where Jack and Jill toss two one-dollar coins
simultaneously. After the toss, Jack gets all the heads & Jill the tails.
Experiment = tossing four coins,
Sample space S = {HHHH, HHHT , . . .}.
Consider random variables
N = the number of heads.
Y = the payoff function Jacks net gain.
Then N(HHTT ) = 2, Y (HHTT ) = 0, Y (HHHT ) = 1.
Daniel Chan (UNSW)
28 / 76
Notation
If X is a random variable defined on the sample space S, we write
P(X = r ) = P({s S : X (s) = r }),
P(X 6 r ) = P({s S : X (s) 6 r }), ... and so on.
In above e.g. we have
P(Y > 0) = P(A) =
5
.
16
29 / 76
11
.
16
Usually drop the subscript X & write F (x) for FX (x) if X understood.
If a 6 b then F (b) F (a) = P(a < X 6 b) > 0.
F is non-decreasing i.e. a 6 b = F (a) 6 F (b).
lim F (x) = P() = 0, and lim F (x) = P(S) = 1.
30 / 76
Definition
X is discrete if its image is countable i.e. we can list its values as
x0 , x1 , . . . , xn or x0 , x1 , x2 , . . ..
The probability distribution of a discrete random variable X is the
function f (x) = P(X = x) on R.
This function is encoded in the sequence p0 , p1 , p2 , . . . , where
pk = P(X = xk ).
E.g. Consider the experiment of tossing 2 coins so
S = {HH, HT , TH, TT }. Let X = no. heads.
xk 0 1 2
pk
Note any probability distribution satisfies: i) pk > 0 for k = 0, 1, 2, ..., ii)
p0 + p1 + = 1.
Daniel Chan (UNSW)
31 / 76
pk = P(X = k) = , for k = 1, . . . , 6.
k
a) Find the value of .
b) Sketch the probability distribution and the probability histogram of
the random variable X .
c) Find and sketch the cumulative distribution function of X .
Solution
a) Since p1 + p2 + p3 + p4 + p5 + p6 = 1, where pk = P(X = k), we have
+
and hence =
49
= 1,
+ + + + =
20
2
3
4
5
6
20
.
49
32 / 76
Solution (Continued)
b) The probability distribution for X :
k
P(X = k)
20
49
10
49
20
147
5
49
4
49
10
147
33 / 76
Solution (Continued)
c) The cumulative distribution function of X : F (x) = P(X 6 x).
20
When x < 1, F (x) = 0; when 1 6 x < 2, F (x) = P(X = 1) = ;
49
30
20 10
+
= , etc.
when 2 6 x < 3, F (x) = P(X = 1) + P(X = 2) =
49 49
49
F (x) =
20
49
30
49
110
147
125
147
137
147
if
if
if
if
if
if
if
x <1
16x <2
26x <3
36x <4
46x <5
56x <6
x >6
34 / 76
Geometric Distribution
Definition
Fix the parameter p (0, 1). The Geometric distribution G (p) is the
function
G (p, k) := (1 p)k1 p = p , k = 1, 2, 3, ... .
k
j=k1=0
35 / 76
Solution
The possible values of X are 1, 2, 3, . . .
P(X = k) = P(not 6 for every toss in the first (k 1) times,
and 6 at the kth toss)
k1
5
1
=
6
6
5k1
=
6k
Hence X G ( 16 ).
Daniel Chan (UNSW)
36 / 76
Solution
Marks
10
Tally
Frequency
Probability
Daniel Chan (UNSW)
37 / 76
What is statistics?
Statistics is the science of handling data (e.g. test marks) with in-built
variability or randomness. Statistics often organises and analyses data for
various purposes such as the following.
Present the data in more meaningful ways e.g. listing the frequencies
or probabilities of marks as we did.
Suppose in previous slide, you got one of the 9s, you might want to
know how you fared compared to everyone else. One way is to
compute the mean (here 6) or some other measure of central
tendency.
You might now ask, are you much better than average, or just a
little? One way is to determine how spread out the marks are. In
other words, wed like some measures of dispersion.
(Hypothesis testing) Suppose next year, the tutor changes his/her
teaching method and the mean test score moves up to 6.5. Does this
indicate effectiveness of the new teaching method, or can this be
accounted for as a random fluctuation.
Daniel Chan (UNSW)
38 / 76
pk xk ,
k=0
39 / 76
S
R
R.
P
Formula: E (Y ) = E (g (X )) =
k=0 g (xk )pk .
X
Var(X ) =
(xk X )2 pk = E (X E (X ))2 .
k=0
40 / 76
Theorem
Var(X ) = E (X 2 ) (E (X ))2 .
Proof.
Writing for E (X ), the definition of variance gives,
X
Var(X ) =
(xk )2 pk
k
xk2 pk 2
xk pk + 2
pk
= E (X 2 ) 22 + 2
= E (X 2 ) (E (X ))2 .
Daniel Chan (UNSW)
41 / 76
Solution
The probability distribution for X :
P(X = k)
1
4
1
2
1
4
42 / 76
E (Y ) = aE (X ) + b,
SD(Y ) = |a|SD(X ).
43 / 76
Definition
The Binomial distribution B(n, p) is the function
n
B(n, p, k) =
p k (1 p)nk where k = 0, 1, . . . , n,
k
and 0 otherwise.
Note B(n, p, k) > 0, and
n
X
k=0
44 / 76
Theorem
If X B(n, p), then E (X ) = np and Var(X ) = np(1 p).
Example
Hillary has a 17 poke balls. Each gives her a 20% chance of capturing any
particular pokemon. She tries to capture as many as she can.
1
What is the probability that she captures more than Donalds one
pokemon.
45 / 76
Solution
46 / 76
1
,
p
Var(X ) =
1p
.
p2
Proof for E (X )
We use the formula
X
d X k
d
1
1
k1
kx
=
x =
.
=
dx
dx 1 x
(1 x)2
k>0
k>0
E (X ) =
47 / 76
Example
A die is repeatedly rolled until a 6 appears.
a) What is the probability that the first 6 appears within 5 rolls?
b) What is the probability that the first 6 appears after the 5th roll?
c) What is the expected number of rolls to get the first 6?
Solution
48 / 76
Example
James T. Kirk did the same experiment. In 5 attempts, he picked the right
card 2 times (40 % of the trials).
The expected number of correct guesses is only 25 % which may lead you
to think that James & Spock are psychic.
We test the Claim: James & Spock are (not) psychic.
Daniel Chan (UNSW)
49 / 76
50 / 76
Tail probabilities
51 / 76
52 / 76
95.2
97.4
97.6
97.3
95.3
98.2
96.0
98.2
98.5
96.8
99.1
99.4.
The previous system had a retention rate of 96 %. Can we claim that the
new system is better?
Solution
We first replace each score with +, 0, , when the reading is bigger
than, the same, less than 96%, respectively.
This gives the following sequence with 12 plus signs
+
+.
53 / 76
Solution (Continued)
We hypothesise that there is no difference between the old system
and the new system, i.e. + & - are equally likely, & calculate the tail
probability of how improbable it is to get such readings under this
hypothesis.
We ignore the zero (why?), so the number of plus signs, X , among
the 14 measurements has a binomial distribution B(14, 0.5). Hence
the tail probability of getting as good as 12 plus signs is
P(X > 12)
12 2 13 14
14
1
14
1
1
1
14
1
=
+
+
13
12
2
2
2
2
14
2
0.00647
54 / 76
Solution (Continued)
Conclusion.
Under the assumption that the new system is not better than the old
one, the (tail) probability of getting 12 or more measurements higher
than 96 % among 14 is approximately 0.65 % (less than 5 %). Hence
there is evidence that the new system is better than the old one.
55 / 76
for
x R.
Definition
A random variable X is continuous iff its cumulative distribution function
FX (x) is continuous.
Strictly speaking, F (x) should also be differentiable except for at most
countably many points.
Daniel Chan (UNSW)
56 / 76
Example
The light rail goes up High St every 5 minutes. The Vice-Chancellor waits
X minutes for the tram every afternoon. Suppose the cumulative
distribution function for X is
0 for x < 0
F (x) = x5 for 0 6 x 6 5
1 for x > 5
Whats the probability he waits more than 3 minutes?
Solution
57 / 76
Definition
The probability density function f (x) of a continuous random variable X
is defined by
d
f (x) = fX (x) =
F (x) , x R
dx
d
if F (x) is differentiable, and lim
F (x) if F (x) is not differentiable at
xa dx
x = a.
58 / 76
Solution
Let the probability density function for X be f (x). Besides x = 0, 5, we
have
0 for x < 0
0
f (x) = F (x) = 15 for 0 < x < 5
0 for x > 5
By definition, f (0) = lim F 0 (x) = 0.
x0
1
Similarly, f (5) = lim F 0 (x) = .
5
x5
0 for x 6 0
Hence f (x) = 51 for 0 < x 6 5
0 for x > 5
Daniel Chan (UNSW)
59 / 76
f (t)dt.
Hence for a 6 b,
Proposition
Z
P(a 6 X 6 b) = P(a < X 6 b) = F (b) F (a) =
f (x)dx
a
and
f (x) dx = 1,
60 / 76
Example
Suppose X is a random variable with probability density function
k x >1
f (x) = x 4
0
otherwise
where k is a constant.
a) Find the value of k.
b) Find P(X 6 3) and P(2 6 X 6 3).
c) Find and sketch the cumulative distribution function for X .
Solution
Z
Z
Z
k
a) Use the fact that
f (x)dx = 1, and
f (x)dx =
dx.
x4
1
k R
k
lim 3
= = 1, and hence k = 3.
R
3x 1
3
Daniel Chan (UNSW)
61 / 76
Solution (Continued)
1
26
3
1 3
dx = 3 = 1
=
b) P(X 6 3) =
f (x)dx =
4
x 1
27
27
1 x
Z 3
1 3 1
1
19
3
P(2 6 X 6 3) =
dx = 3 =
=
4
x 2 8 27
216
2 x
Z
c)
Z
F (x) =
f (t) dt
1 1
=
x3
0
for x > 1
for x 6 1
62 / 76
Definition (Mean)
The expected value (or mean) of a continuous random variable X with
probability density function f (x) is defined to be
Z
= E (X ) =
xf (x)dx .
Theorem
If X is a continuous random variable with density function f (x), and g (x)
is a real function, then the expected value of Y = g (X ) is
Z
g (x)f (x)dx .
E (Y ) = E (g (X )) =
63 / 76
Definition (Variance)
The variance of a continuous random variable X is
Var (X ) = E ((X E (X ))2 ) = E (X 2 ) (E (X ))2 .
p
The standard deviation of X is = SD(X ) = Var (X ).
Example
Find the mean, the variance, and the standard deviation of the random
variable defined in the previous example.
Solution
3
3 R
3
E (X ) =
x f (x)dx =
x 4 dx = lim 2
=
R
x
2x 1
2
1
R
Z
Z
3
3
x 2 f (x)dx =
E (X 2 ) =
dx = lim
=3
R
x2
x 1
2
3
3
3
= , hence SD(X ) =
.
Var(X ) = E (X 2 ) (E (X ))2 = 3
2
4
2
Z
64 / 76
and
X
, then
Var(Z ) = 1.
Proof.
1
= E (X ) = = 0
E (Z ) = E
X
1
1
2
X
Var(Z ) = Var
= 2 Var(X ) = 2 = 1
65 / 76
Normal Distribution
The most important continuous distribution in statistics is the normal
distribution. It turns out that it is the limiting case of the binomial
distribution.
Definition
A continuous random variable X is said to have normal distribution
N(, 2 ) if it has probability density
2
1 x
1
(x) =
e 2
where < x < .
2 2
We write X N(, 2 ) to denote that X has the normal distribution
N(, 2 ).
and
Var(X ) = 2 .
66 / 76
Remarks
The probability density function of a normal distribution has a bell
shape graph symmetrical about the mean.
The distribution N(0, 1) is called the standard normal distribution.
X
If X N(, 2 ) then the standardised random variable Z =
is
t 2
67 / 76
Z z
1 2
x
1
e 2 t dt .
)=
P(X 6 x) = P(X 6 x) = P(Z 6 z =
The value of this integral for various z can be found either via a calculator
or the table given in the Algebra Notes. This table gives the values of this
integral for z in the range 3 to 3. For z less than 3, the value is
essentially zero, while for z greater than 3, the value is essentially 1.
Remember you can picture the value P(Z 6 z) as the area to the left of z
under the probability density curve of Z N(0, 1).
68 / 76
Solution
69 / 76
Solution
70 / 76
Solution (Continued)
71 / 76
b
X
288
.25k .75288k
k
k=a
may involve large difficult sums. Alternately, we might view this sum as an
area under the probability histogram (in blue) below.
72 / 76
Theorem
Let Y be a continuous random variable with Y N(np, np(1 p)) i.e.
the same mean and standard deviation as X . Then for any integer x
P(X 6 x) P(Y 6 x + .5),
P(X < x) P(Y 6 x .5).
As a rule of thumb, the approximations are usually good enough if both
np > 10 and n(1 p) > 10.
Daniel Chan (UNSW)
73 / 76
74 / 76
Solution
75 / 76
Solution (Continued)
76 / 76