Probability Statistics Random Proc

PROBABILITY, STATISTICS, AND RANDOM PROCESSES EE 351K
INDEX
! factorial .................... 3, 24 D(X) standard deviation ... 14 general math..................... 24 coefficients.................. 20
1st fundamental theorem of De Morgan’s laws ............ 10 generating function..... 18, 19 p(j) coefficients of the
probability...................... 16 density function.................. 8 ordinary ...................... 19 ordinary generating function
2nd fundamental theorem of beta............................. 12 properties .................... 20 ...................................... 20
probability...................... 17 exponential.................... 9 geometric distribution......... 7 permutation........................ 3
absorbing matrices............ 21 joint .............................. 9 glossary............................ 25 P-matrix........................... 21
absorption normal .......................... 8 graphing terminology ....... 25 poisson distribution ............ 7
probability................... 21 standard normal............. 8 h(z) ordinary generating poker probabilities............ 10
time ............................ 21 dependent variable............ 25 function.......................... 19 posterior probabilities ....... 11
approximation theorem..... 18 derangement..................... 25 hat check problem ............ 23 prior probabilities ............. 11
area derivatives........................ 24 hypergeometric distribution 7 probabilities
sphere ......................... 25 deviation .......................... 14 hypotheses ....................... 11 posterior...................... 11
Bayes' formula ................. 11 disoint................................ 3 inclusion-exclusion principle prior............................ 11
Bayes' inverse problem..... 11 distribution ........................................ 3 probability.......................... 2
Bayes' theorem................. 11 hypergeometric.............. 7 independent events ............. 4 uneven .......................... 4
Bernoulli trials uniform....................... 25 independent trials ............. 25 probability of absorption... 21
central limit theorem.... 17 distribution function . 5, 6, 25 independent variable......... 25 probability space ................ 3
Bernoulli trials process ... 6, 7 binomial........................ 6 inequality probability tree ................. 11
beta density function......... 12 cumlative ...................... 6 Chebyshev .................. 16 problems .......................... 23
binomial coefficient............ 5 discrete uniform ............ 5 Markov ....................... 13 properties of expectation... 13
binomial distribution exponential.................... 8 infinite sample space .......... 3 Q-matrix .......................... 21
central limit theorem.... 17 geometric ...................... 7 infinite sum...................... 24 quadratic equation ............ 25
negative ........................ 7 joint .............................. 7 integration........................ 24 random variable................ 25
binomial distribution function joint cumulative............. 6 intersection......................... 3 normal .......................... 4
........................................ 6 multinomial................... 7 joint cumulative distribution standard normal............. 4
binomial expansion........... 25 negative binomial .......... 7 function............................ 6 regular chain .................... 22
binomial theorem ............. 25 poisson.......................... 7 joint density function .......... 9 properties .................... 22
birthday problem .............. 23 double coin toss.................. 2 joint distribution................. 7 reliability ......................... 10
B-matrix........................... 21 e natural number.............. 24 L’Hôpitol’s rule................ 24 reverse tree....................... 11
calculus............................ 24 E(X) expected value......... 12 law of averages................. 16 R-matrix........................... 21
canonical form ................. 21 envelope problem ............. 23 law of large numbers ........ 16 round table ......................... 3
card problem .................... 23 ergodic chain.................... 22 linearizing an equation...... 25 sample problems............... 23
central limit theorem......... 17 Euler's equation................ 24 logarithms ........................ 24 sample space .................. 3, 5
general form................ 18 events M mean first passage matrix series................................ 25
sum of discrete variables independent................... 4 ...................................... 22 set properties...................... 3
.............................. 18 ex infinite sum ................. 24 m(ω) distribution function.. 5 single coin toss................... 2
central limit theorem for expectation Markov chains.................. 20 Sn* standardized sum....... 17
Bernoulli trials................ 17 properties .................... 13 Markov inequality ............ 13 specific generating functions
central limit theorem for expectation of a function... 13 mean................................ 12 ...................................... 19
binomial distributions ..... 17 expectation of a product.... 13 median ............................. 25 sphere .............................. 25
Chebyshev inequality ....... 16 expected value.......... 6, 8, 12 medical probabilities......... 23 standard deviation .... 6, 8, 14
coefficient exponential density function9 memoryless property ........ 12 standard normal density
binomial........................ 5 exponential distribution ...... 8 moment generating function function............................ 8
multinomial................... 5 f(t) exponential density ................................ 18, 19 standard normal random
coefficients of ordinary function............................ 9 moments .......................... 19 variable ............................ 4
generating function ......... 20 f(x) density function........... 8 multinomial coefficient....... 5 standardized sum.............. 17
combinations...................... 3 F(x,y) joint cumulative multinomial distribution ..... 7 states................................ 20
conditional density function distribution function.......... 6 multiple toss....................... 2 step .................................. 20
continuous................... 11 f(x,y) joint density function 9 mutually exclusive.............. 3 stirling approximation....... 24
conditional probability 10, 11 f(ω) continuous uniform NA(0,z) table of values..... 17 stochastic ......................... 25
continuous conditional density function ................ 9 natural number e............... 24 sum
density function .............. 11 factorial........................ 3, 24 Nc-matrix......................... 21 standardized ................ 17
continuous uniform density. 9 failure rate.......................... 9 negative binomial sum of continuous variables
convolution ...................... 15 fixed probability matrix .... 22 distribution ....................... 7 ...................................... 15
example ...................... 16 fixed probability vector..... 22 N-matrix .......................... 21 sum of discrete variables... 15
correlation........................ 15 fundamental matrix..... 21, 22 normal density function ...... 8 table of values for NA(0,z) 17
covariance........................ 15 FX(x) cumulative normal normal random variable ...... 4 time to absorption............. 21
cumulative distribution distribution function.......... 6 normalization ..................... 8 t-matrix............................ 21
function............................ 6 fX(x) normal density function NR-matrix........................ 21 transient state ................... 21
cumulative normal ........................................ 8 ordering ............................. 3 transition matrix ............... 20
distribution function.......... 6 g(t) generating function... 18, ordinary generating function tree
cyclic permutation .............. 3 19 ...................................... 19 probability................... 11
Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 1 of 25

trials variance ........................... 14 W fixed probability matrix22 λ failure rate...................... 9
independent................. 25 continuous random w limiting vector ............. 22 µ expected value.............. 12
trigonometric identities..... 24 variables................. 14 weak law of large numbers16 µn moments ..................... 19
uneven probability.............. 4 discrete random variables Z fundamental matrix of an ρ correlation.................... 15
uniform density .................. 9 .............................. 14 ergodic chain.................. 22 σ standard deviation ........ 14
uniform distribution...... 5, 25 properties .................... 14 Z standard normal random σ2 variance...................... 14
union.................................. 3 vector variable ............................ 4
V(X) variance .................. 14 fixed probability.......... 22 Ω sample space ............. 3, 5
variable volume φ(x) standard normal density
random........................ 25 sphere ......................... 25 function............................ 8
PROBABILITY MULTIPLE COIN TOSS

n
Mathematically, the probability of an outcome is equal There are 2 possible outcomes to a multiple coin toss
to the number of possible positive outcomes divided (when considering order of the results).
by the total possible outcomes (size of the sample The probability of getting at least one heads:
space).
P ( getting at least one heads ) = ( 2n − 1)
1
number of possible positive outcomes
P ( positive outcome ) =
total number of possible outcomes 2n
1
For example, if there are 5 balls in a box and 3 are green, P ( getting at least one heads ) = 1 −
the probability of choosing a green ball is 2n
3 The probability of getting exactly 2 heads:
P ( choosing green ball ) = number of
5 number
possible
of possible
positions positions
of first of second
SINGLE COIN TOSS }

heads 678
heads
 getting exactly  n ( n − 1) ⋅ 1
There is an equal probability that the outcome will be P =
heads or tails.  two heads  2{ {
2n
order of occurance inverse
1
P ( H ) = P (T ) =
is not a factor outcome
2 The probability of getting exactly 3 heads:

number number of number of
of possible possible possible
DOUBLE COIN TOSS positions positions positions
of first of second of third
There are four possible outcomes with equal }

heads 678 678
heads heads
probability.  getting exactly  n ( n − 1)( n − 2 ) ⋅ 1

P =
1  three heads  2{⋅3 {
2n
P ( HH ) = P ( HT ) = P (TH ) = P (TT ) = number of ways to inverse
4 order 3 items outcome
The probability of getting at least one heads would be the
sum of the outcomes providing that result:
3
P ( HH ) + P ( HT ) + P (TH ) =
4
Alternatively, the probability of getting at least one heads
could be thought of as one minus the probability of not
getting at least one heads:
3
1 − P (TT ) =
4

SET PROPERTIES Ω INFINITE SAMPLE SPACE
If a coin is tossed until it turns up heads, then the
Given two sets A and B A B sample space for possible outcomes is
Ω = {1, 2, 3, K}
The union of two sets refers to that
which is in set A or set B. In terms
A∪B
of area, it is the sum of the areas PERMUTATION
minus the common area.
A permutation is a mapping of a finite set onto itself.
The intersection of two sets refers For example if we have the set A = {a,b,c}, there are 3!
to that which is in set A and set B.
In terms of area, it is the common A∩B possible permutations. One of them is
area found in both A and B. a b c
Two sets with no common c b a
elements are called disjoint. More  
than two such sets are called
A B In other words, there are n! ways to arrange n objects.
mutually exclusive. However, in the case of a cyclic permutation, there are
(n-1)! ways to arrange n objects. An example of a cyclic
Theorems: P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B ) permutation would be the seating of n people at a round
table.
P ( A) + P ( A ) = 1
A means everything that is not in A ORDERING/COMBINATIONS 1
n different objects can be ordered in n! ways or
permutations. What about the case when some
INCLUSION-EXCLUSION PRINCIPLE objects are identical? For example, consider the
Expanding on a theorem presented in the previous letters in the name
box, the probability that at least one event contained KONSTANTOPOULOS
within a group of (possibly) intersecting sets of events
occurs is the sum of the probabilities of each event There are 15 letters in the word. This includes 4 Os, 2 Ns, 2
minus the sum of the probabilities of all 2-way Ts, and 2 Ss. How many 15-letter combinations can be
formed with these letters given that some letters are
intersections, plus the sum of the probabilities of all identical?
3-way intersections, minus the sum of all 4-way
intersections, etc. 15!
number of words =
4!2!2!2!
∑ P(E ∩ E )
n
P ( E1 ∪ E2 ∪ L ∪ En ) = ∑ P ( Ei ) − i j
i =1 1≤ i< j ≤ n
P ( Ei ∩ E j ∩ Ek ) − L
ORDERING/COMBINATIONS 2
+ ∑
1≤ i< j < k ≤ n In how many ways can n identical objects be arranged
E = an event, in this case in m containers?
 n + m − 1 ( n + m − 1)!
 =
Ω SAMPLE SPACE  n − 1  ( n − 1)! ×( m − n )!
The set of all possible outcomes. (Means the same as A problem that appeared in the textbook was, how may
probability space, I think.) For example, if a coin is ways can 6 identical letters be put in 3 mail boxes? The
answer is 28.
tossed twice, the sample space is
Ω = {( H, H ) , ( H, T ) , ( T, H ) , ( T, T )}
Note that the sample space is not a number; it is a collection
or set of results. This is frequently a source of confusion;
the size of the sample space is a number, but the sample
space itself is not a number. See Probability, p2.

NORMAL RANDOM VARIABLE INDEPENDENT EVENTS p.139
A normal random variable has a Gaussian density Two events A and B are called independent if the
function centered at the expectation µ. The figure outcome of one does not affect the outcome of the
below shows plots of the density functions of two other. Mathematically, (for a particular probability
normal random variables, centered at the common assignment/distribution) two events are independent if
expectation of 0. The plot having the sharper peak is the probability of both events occurring is equal to the
for the special case of a standard normal random product of their probabilities.
variable determined by and expectation of 0 and
deviation of 1. A normal random variable does not
P ( A ∩ B ) = P ( A) ⋅ P ( B )
necessarily have an expectation of zero. For example, the outcome of the first roll of a die does not
affect the second roll. The independence of two events can
be lost if the probabilities are not even, e.g. an unfair coin
or die.
Independence can also be expressed in terms of a
conditional probability. The probability of A given B is still A.
P ( A | B ) = P ( A)
And if this is true, then it is also true that
P ( B | A) = P ( B )
Plot of the density functions for normal random UNEVEN PROBABILITY

variables with expectation µ = 0
If we assign the probability n to the outcome of heads
of an unfair coin, then
Z STANDARD NORMAL RANDOM P (H) = n
VARIABLE p.213
P (T) = 1− n
A standard normal random variable has the
parameters expectation µ = 0 and deviation σ = 1 (see The probabilities of a double coin toss will be
Normal Density Function p.8).
TT: (1 − n )2
n (1 − n )
A normal (i.e. Gaussian density) random variable with
TH:
parameters µ and σ can be written in terms of the
standard normal random variable: HT: n (1 − n )
X = σZ + µ HH: n2
The process of changing a normal random variable to a
standard normal random variable is called standardization.
If X has a normal distribution with parameters µ and σ, then
the standardized version of X is
X −µ
Z=
σ

BINOMIAL COEFFICIENT p.95 DISTRIBUTION FUNCTIONS
On my TI-86 calculator, the command to do this is a nCr b. DISTRIBUTION FUNCTION p.19
The nCr function is found under MATH/PROB. A distribution function is a function that describes
probabilities as a function of outcomes. That is, for
a a! every possible outcome, the distribution function gives
b  =
  ( a − b )! b !
the probability. There are many different types of
distributions functions; determining which one applies
to a particular situation is often difficult so that in
 52 
 10  is read “52 choose 10” and stands for the teaching this subject the topic may be avoided entirely
  with the advice offered that “you need to work a lot of
number of combinations of 10 there are in a pool of 52 problems in order to develop a sense of which
units. distribution function to use.”
Example Problem: what are the chances that there will be 4 When we take the derivative of a distribution function,
aces among 10 cards picked from a deck of 52? the result is the density function (p8).
( )()
48! 4! 48! 4!
48 × 4 × ×
6 4
=
( 48 − 6 ) ! 6! ( 4 − 4 ) = 42!6! 0!4!
!4!
ω ) DISCRETE UNIFORM
( )
52 52! 52! m(ω
DISTRIBUTION FUNCTION p.19,367
10 ( 52 − 10 ) !10! 42!10!
What this is saying is, “From 48 non-aces choose 6, then The function assigning probabilities to a finite set of
from 4 aces choose 4". The product is the number of equally likely outcomes. For example, the distribution
possible combinations of 10 that can contain 4 aces. "Now function of a fair double coin toss is
divide this amount by all of the possible combinations of 10
1
cards out of 52.” Note that 0! = 1, so we have m ( H, H ) = m ( H, T ) = m ( T, H ) = m ( T, T ) =
48! 4
The distribution function for the roll of a die is
42!6! 48! 42!10!
= = = 0.000776  1 2 3 4 5 6 
52! 42!6!52! mi =  
42!10! 1/6 1/6 1/6 1/6 1/6 1/6 
In general, the discrete uniform distribution function is
MULTINOMIAL COEFFICIENT 1/ ( l − k + 1) , x = k , k + 1, k + 2,K , l

P ( X = x) = 
A binomial coefficient becomes multinomial when there is 0, otherwise
more than one type of object to choose or there is more than
one location to choose objects from.
µ=
k +l
σ2 =
( l − k )( l − k + 2 )
Multiple object types: 2 12
etk − et (l +1)
 n  n! g (t ) =
n n L n  = ( l − k + 1) (1 − et )
Generating Function:
 1 2 k  n1 ! n2 !L nk !
 n  The distribution function may be an infinite series. For
 n n L n  is read “n choose n1 of type 1, n2 of example, if a coin is tossed until the first time heads
 1 2 k 
turns up then the distribution function would look like:
type 2 … and nk of type k”.
∞
1 1 1
Multiple object locations: ∑ m ( ω) = 2 + 4 + 8 + L
ω= 0
For example, the number of ways we can arrange n
objects in k boxes The sum of a distribution function is the sum of the
 n + k − 1 ( n + k − 1) !
probabilities and is equal to one. See also Specific
Generating Functions p19.
 n =
  n ! ( k − 1) ! µ = center of the density, average value, expected value
σ2 = variance
k = the lowest value in the sample space
l = the highest value in the sample space
g(t) = generating function

CUMULATIVE DISTRIBUTION FUNCTION BERNOULLI TRIALS PROCESS
p.61 p.96,233,261
When X is a continuous real-valued random variable, A Bernoulli trials process is a sequence of n chance
the cumulative distribution function of X is experiments such that 1) each experiment has two
possible outcomes and 2) the probability p of success
FX ( x ) = P ( X ≤ x ) = ∫ f ( t ) dt
x
of each experiment is the same and is unaffected by
−∞
the knowledge of previous experiments. Examples of
In other words, it is the probability of a positive result over a Bernoulli trials are flipping coins, opinion poles, and
range of initial occurrences. The cumulative distribution win/lose betting. See also Negative Binomial
function is useful for finding the density function when a
random variable that is a function of another random
Distribution p7.
variable is involved. The density function is the differential µ = np σ2 = np (1 − p )
of the cumulative distribution function.
X = random variable: the observation of an experimental p = probability of a successful outcome
outcome
x = a variable representing a particular outcome
f(x) = density function b(n,p,k) BINOMIAL DISTRIBUTION
t = a dummy variable of integration
FUNCTION p.184
A function assigning probabilities to a finite set of trials
FX(x) CUMULATIVE NORMAL where there are two possible results per trial, not
DISTRIBUTION FUNCTION necessarily of equal probability. The binomial
distribution produces a bell-shaped curve. When the
The derivative of the normal density function (p.8). parameter n is large and the parameter p is small, the
The function has parameters µ (expected value) and σ Poisson Distribution is a useful approximation may be
(standard deviation). FX must be computed using used instead. The expectation is E(X) = np.
()
numerical integration; there are tables of values for
this function in Appendix A of the textbook. b ( n, p , k ) = n p k q n − k where
k
1
FX ( x ) = ∫ e( )
( nk )
x − x −µ
2
/ 2 σ2
du
−∞
σ 2π accounts for the ways the result can be ordered
X = random variable for the number of occurrences in a
φ ( x ) = ( q + pe x )
n
given interval in time, area, length, etc. µ = np σ2 = npq
x = a particular outcome.
µ = center of the density, average value, expected value In the case of equal probabilities (p = 0.5), the function
σ = a positive value measuring the spread of the density, reduces to
standard deviation
()
b ( n, 0.5, k ) = n 0.5n
k
F(x,y) JOINT CUMULATIVE In the case of no successful outcomes (k = 0), the function
DISTRIBUTION FUNCTION p.165 reduces to
The example below is for two random variables and b ( n, p , 0 ) = q n
may be extended for additional variables.
F ( x, y ) = P ( X ≤ x, Y ≤ y )
n = number of trials or selections
p = probability of success
q = probability of failure (1-p)
k = number of successful outcomes
µ = center of the density, average value, expected value
σ2 = variance
φ(x) = density function
For example, if I can guess a person's age with a 70%
success rate, what is the probability that out of ten people, I
will guess the ages of exactly 8 people correctly?
( )
b (10,.7,8) = 10 .78.310−8 = 0.233
8
See also Specific Generating Functions p19.

MULTINOMIAL DISTRIBUTION NEGATIVE BINOMIAL DISTRIBUTION
p.186
This problem involves more than one type of random
variable. For example, a box contains M green balls Negative binomial distribution is a more general form
and N red balls. If we choose k balls, what is the of geometric distribution. A new variable k is
probability that m are green and n are red? introduced representing the number of successful
( m )(k − m )
outcomes in x attempts. For geometric distribution,
M N k = 1.
P ( of selecting m, n balls ) =
( M k+ N )
 x − 1 k x − k
u ( x, k , p ) = P ( X = x ) =  p q
 k − 1
This example was also used for hypergeometric distribution.
See section 5.1. This seems to describe the Bernoulli Trials Process which
is a sequence of x chance experiments such that 1) each
experiment has 2 possible outcomes and 2) the probability p
JOINT DISTRIBUTION FUNCTION p.141 of success is the same for each experiment and is not
affected by knowledge of previous outcomes.
A joint distribution function describes the probabilities X = random variable: the observation of an experimental
of outcomes involving multiple random variables. If outcome
the random variables are mutually independent then x = the number of attempts
the joint distribution function is the product of the k = the number of successful outcomes
individual distribution functions of the random p = the probability that any one event is successful
variables. q = the probability that an event is not successful, 1 - p
FX ,Y ( x, y ) = FX ( x ) FY ( y )
POISSON DISTRIBUTION p.187
GEOMETRIC DISTRIBUTION p.184 An approximation of a discrete probability distribution.

The Poisson distribution is used as an approximation
The geometric distribution applies to a series of dual- to the binomial distribution when the parameters n and
outcome events (such as coin tosses) where p is the p are large and small, respectively. It is also used in
probability of success on any given event, q is the situations where it may not be easy to interpret or
probability of failure, j is the event number, and T is a measure the parameters n and p. See also Specific
random variable that stands for the event that Generating Functions p19.
produces the first success. The geometric distribution
λ k −λ
P(X = k) ≈ e
function has the memoryless property, p8. See also
Specific Generating Functions p19. k!
P (T = j ) = q j −1 p ( )
λ et −1
µ=λ σ2 = λ g (t ) = e
1 q pet
µ= σ = 2
2
g (t ) = X = random variable: the observation of an experimental
p p 1 − qet outcome
λ = rate of occurrence, i.e. the number of positive outcomes
T = the first successful event in the series expected over a given period. This might be the
j = the event number 1, 2, 3, etc. product of the probability and the number of trials.
p = the probability that any one event is successful k = number of positive outcomes
q = the probability that an event is not successful, 1 – p µ = center of the density, average value, expected value
µ = center of the density, average value, expected value σ2 = variance
σ2 = variance g(t) = generating function
g(t) = generating function

EXPONENTIAL DISTRIBUTION p.205 fX(x) NORMAL DENSITY FUNCTION p.212
The exponential distribution function is the integral of If a large number of mutually independent random
the exponential density function. It represents the variables is considered, the normal density function is
probability that an event will take place before time t. a close approximation. The normal density function
See Exponential Density Function on page 9. has parameters µ (expected value) and σ (standard
F ( t ) = 1 − e −λt deviation).
1 1
Expectation: µ= Variance: σ2 =
λ λ2
1 λ
Deviation: σ= Generating Fnct.: g (t ) =
λ λ −t
λ = rate of occurrence, a parameter
t = time, units to be specified
DENSITY FUNCTIONS
Plot of the normal density function for µ = 0
f(x) DENSITY FUNCTION p.59
1
fX ( x) = e( )
2
− x −µ / 2 σ2
The density function is the derivative of the distribution
function F(x) (see p5). The integral of a density σ 2π
function over its entire interval is equal to one. So by
X = random variable: the observation of an experimental
integrating a density function over a particular interval
outcome
we determine the probability of an outcome falling x = a particular outcome.
within that interval.
µ = center of the density, average value, expected value
σ = the standard deviation, a positive value measuring the
P ( a ≤ X ≤ b ) = ∫ f ( x ) dx , f ( x) = F′( x)
b
a spread of the density

The density function has no negative values. A function that
does not integrate to one may be normalized by dividing the
function by its integral. φ(x) STANDARD NORMAL DENSITY
FUNCTION p.325
The case of the normal density function with

parameters µ = 0 (expected value) and σ = 1 (standard
deviation).
Plot of the standard normal density function

1 − x2 / 2
φ( x) = e
2π

ω ) CONTINUOUS UNIFORM DENSITY
f(ω EXPONENTIAL DENSITY FUNCTION p.205
p.205
The exponential density has the parameter λ. The
Uniform density means that the probability of an function is often used to describe an expected lifetime
outcome is equally weighted over the interval of where the parameter λ is the failure rate. A higher
consideration. Continuous means that there are an value of λ means the failure is likely to be sooner. The
infinite number of possible outcomes (as opposed to a exponential density function has the memoryless
discrete number). Consider random values on the property, p8. The exponential density function will be
interval [a,b]. The (uniform) density function is on the exam.
1
f ( ω) = Example: the probability that a light bulb burns out
b−a after t hours. The total area under the curve is one;
The mean value or expectation of an experiment having the area from the interval [a,b] is the probability of
uniform density is then failure during that time.
b−a Exponential density function: f ( t ) = λe −λt
µ=
2
λ
f (t )
f(x,y) JOINT DENSITY FUNCTION p.165
Where X and Y are continuous random variables, the

a b t
joint density function is
∂ F ( x, y )
2 Probability distribution function: F ( t ) = 1 − e −λt
f ( x, y ) = , where f ( x ) = F ′ ( x )
∂x ∂y
λ
The joint density function satisfies the following
equation: F (t )
F ( x, y ) = ∫ f ( t , u ) dt du
t u
−∞ −∞ ∫ t
The joint density function can involve more than two e.g. for a random variable T
variables and looks like P (T > x ) = e −λx P (T ≤ x ) = 1 − e −λx
∂ n F ( x1 , x2 ,L , xn )
f ( x1 , x2 ,L , xn ) = Two exponential density f (t )
∂x1 ∂x2 L ∂xn functions are plotted at
2
right with λ = 2 (high λ=2

failure rate) and λ = 0.5
1
(low failure rate). Note λ =.5
that in each case at right
the area under the curve
is 1. Both curves extend t
to infinity.
1
Expectation: µ=
λ
1 1
Variance: σ2 = 2 Deviation: σ=
λ λ
λ
Generating Function: g (t ) =
λ −t

RELIABILITY p.205 POKER PROBABILITIES
The reliability is the probability that an event will take The probability of being dealt a certain poker hand can
place after a given amount of time. be described as the product of the probability of
∞ getting one specific set of cards satisfying the
reliability = ∫ f ( t ) dt requirement times the number of possible sets that
T
would satisfy the requirement.
For example, from the previous light bulb example, the
For example, what is the probability of getting 3 of a kind?
probability that the bulb will last more than T hours is
its reliability. The probability of getting one specific hand satisfying this
∞ requirement is
reliability = ∫ λe −λt
dt = e −λT
T 1 1
= = 384.77 × 10−9
λ = failure rate
t = time [s] ( )
52
5
2,598,960
How many ways can you have 3 of a kind? Consider that

DE MORGAN’S LAWS you have 3 2s and 2 non-2s that are not a pair. Given that
there are 4 suits, there are 4 ways to have 3 2s. There are
Augustus De Morgan, British mathematician 1806- 48 ways to have the 4th card and 44 ways to have the
1871. remaining card. So the number of ways you can have 3 2s
( ) ( )
is 4×48×44=8448. Multiply that by the 13 different
P A∪ B = P A∩ B numerical values in a deck of cards to get the total number
of 5-card hands that contain 3 of a kind (109,824). So the
P ( A ∩ B) = P ( A∪ B) probability is getting 3 of a kind is
1
× 4 × 48 × 44 ×13 = 0.04226
( )
From this we can get
52
(
A − ( B ∩ C ) = A ∩ B ∪C = A ∩ B ∩ C ) ( ) 5
( ) ( )
= A ∩ B ∩ A ∩ C = ( A− B )∩ ( A − C )
P(A|B) CONDITIONAL PROBABILITY
P(A|B) reads, "the probability that event A will occur
given that event B has occurred." Since we know that
B has occurred, the sample space now consists of
only those outcomes in which B has occurred.
P ( A ∩ B)
P(A | B) =
P (B)
For example, let X be the outcome of rolling a die once. Let

A be the event {X = 6} and B be the event {X > 4}.
P(A) = 1/6. But if the die has been rolled and we are told
that B has occurred, then we can only have a 5 or a 6 so
1/ 6 1
P(A | B) = =
1/3 2

f(x|E) CONTINUOUS CONDITIONAL BAYES' INVERSE PROBLEM
DENSITY FUNCTION p.162 Bayes proposed to find the conditional probability that
The formula for continuous conditional density is the unknown probability p lies between a and b, given
m successes in n trials.
 f ( x) / P ( E ) , if x ∈ E
f ( x | E) =  P ( a ≤ p < b | m successes in n trials )
0, if x ∉ E
x m (1 − x )
b n −m
=
∫
a
dx
For example, if we know that a spinner has stopped with its
x m (1 − x )
1 n−m
pointer in the upper half of a circle, 0 ≤ x ≤ ½, then the
conditional density is
∫
0
dx
1/ (1/2 ) , if 0 ≤ x ≤ 1 / 2 2, if 0 ≤ x ≤ 1/ 2 The computation of the integrals is too difficult for exact
f ( x | E) =  =
0, if 1/ 2 < x < 1 0, if 1/ 2 < x < 1 solution except for small values of j and n.
f(x) = the density function for random variables Xi

E = an event with positive probability that gives some
PROBABILITY TREE
evidence about which hypotheses are correct Example: Urn I contains 2 black balls and three white
P(E) = the probability of event E occurring balls, urn II contains 1 black ball and 1 white ball. The
following tree shows the probabilities involved in
selecting an urn and random and selecting a ball from
BAYES' THEOREM it.
Bayes’ theorem is useful if we know P(A|B) and want Urn Color ω p (ω)
of ball
to find P(B|A).
2/5 blk ω1 1/5
P (A | B) P (B)
1/2 I
wht ω2
P ( B | A) =
3/5 3/10
(start)
P ( A)
1/2
1/2 blk ω3 1/4
II
1/2 wht ω4 1/4
This was not found in our textbook.
The reverse tree gives the probabilities of the urn chosen
given the color of the ball selected:
BAYES' FORMULA p.145 Color Urn ω p (ω)
of ball
This is a famous formula but we will rarely use it. If 4/9 I ω1 1/5
the number of hypotheses is small, a simple tree 9/20 blk
measure calculation is done; if the number of 5/9 II ω3 1/4
(start) 11/20
hypotheses is large, then we use a computer. 6/11 I ω2 3/10
wht
P ( H i ) P ( E | Hi ) 5/11 II ω4 1/4
P ( Hi | E ) = m
∑ P(H ) P( E | H )
k =1
k k
Bayes probabilities are used for medical diagnosis. Given a

set of test results and the probabilities for test outcomes,
what are the probabilities that the patient has the disease?
Hi = a set of pairwise disjoint events called hypotheses
E = an event that gives some evidence about which
hypotheses are correct
P(Hi ) = a set of probabilities called prior probabilities
P(Hi |E) = conditional probabilities called posterior
probabilities

α,β
B(α β ,x) BETA DENSITY FUNCTION p.168 MEMORYLESS PROPERTY p.206
A density function having positive parameters α and β. The memoryless property applies to the exponential
When both parameters are equal to one, the beta density function and the geometric distribution
density is the uniform density. When they are both function.
greater than one, the function is bell-shaped; when P (T > ( r + s ) | ( T > r ) ) = P ( T > s )
they are both less than one, the function is U-shaped.
A beta density function can be used to fit data that
does not fit the Gaussian curve of a normal density EXPECTATION
function (p8).
Beta Density Functions
E(X), µ EXPECTED VALUE OF DISCRETE
RANDOM VARIABLES p.225
The expected value, also known as the mean and
sometimes identified as µ, is the sum of the product of
each possible outcome and its probability. It is the
center of mass in a distribution function. If a large
number of experiments are undertaken and the results
are averaged, the value obtained should be close to
the expected value. The formula for the expected
value is
µ = E ( X ) = ∑ xm ( x )
x∈Ω
provided the sum converges. For example, the expected

value for the roll of a die is
1(1/6)+2(1/6)+3(1/6)+4(1/6)+5(1/6)+6(1/6)=3.5.
If the probability p is the same for each of n possible

outcomes, then the expected value is just
µ = E ( X ) = np
 1  α−1 For a uniform density problem (p.9), the expected value is

 x (1 − x ) ,
β−1
 if 0 ≤ x ≤ 1 b−a
B ( α, β, x ) =  B ( α, β )  µ = E(X ) =
 2
0, otherwise
X = numerically-valued discrete random variable: the
observation of an experimental outcome
B ( α, β ) = ∫ x α−1 (1 − x )
1 β−1
Beta function: dx m(x) = discrete distribution function
0
Ω = the sample space
Given α and β, the probability of an event being successful
is
α
P ( success ) =
α+β
If α and β are integers:
B ( α, β ) =
( α − 1)!(β − 1)!
( α + β − 1)!

E(X), µ EXPECTED VALUE OF PROPERTIES OF EXPECTATION p.268, 394
CONTINUOUS RANDOM VARIABLES If X is a real valued random variable with E(X) = µ,
p.268
then
E ( X 2 ) = V ( X ) + µ2 E(Xn) = ∫
The expected value, also known as the mean and +∞
sometimes identified as µ, is the center of mass in a x n f X ( x ) dx
−∞
distribution function. If a large number of experiments
are undertaken and the results are averaged, the If X and Y are two random variables with finite
value obtained should be close to the expected value. expected values, then
The formula for the expected value is
E ( X + Y ) = E ( X ) + E (Y )
+∞
µ = E(X ) = ∫ x f ( x ) dx
−∞ If X is a random variable and c is a constant
E ( cX ) = cE ( X )
+∞
provided that
∫ x f ( x ) dx is finite.
−∞
Otherwise the expected value does not exist. Note that the If X and Y are independent
limits of integration may be reduced provided they include
the sample space.
E ( X ⋅ Y ) = E ( X ) E (Y )
For an exponential density f ( t ) = λe −λt (p.9), the fX(x) = density function for the random variable X
expected value is
1 MARKOV INEQUALITY
µ=
λ The probability that an outcome will be greater than or
equal to some constant k is less than or equal to the
X = random variable: the observation of an experimental expected value divided by that constant.
outcome
f(x) = the density function for random variable X. E(X )
P(X > k) ≤
k
φ (X)) EXPECTATION OF A FUNCTION
E(φ For example, if the expected height of a person is 5.5
p.229 feet, then the Markov inequality states that the
If X and Y are two random variables and Y can be probability that a person is more than 11 feet tall is no
written as a function of X, then the expected value of Y more than ½. This example demonstrates the
can be computed using the distribution of X. looseness of the Markov inequality. A more
meaningful inequality is the Chebyshev inequality,
E (φ ( X )) = ∑ φ ( x ) m ( x )
which is a special case of Markov’s inequality (p. 16).
x∈Ω
again, with the provision that the sum converges.
X = numerically-valued discrete random variable with

sample space Ω
φ(X) = a real-valued function of the random variable X with
domain Ω
Ω = the sample space

VARIANCE
V(X), σ2 VARIANCE OF CONTINUOUS
V(X), σ2 VARIANCE OF DISCRETE RANDOM VARIABLES p.271
RANDOM VARIABLES p.257 Variance is a measure of the deviation of an outcome
from the expected value. The expected value is more
Variance is a measure of the deviation of an outcome
useful as a prediction when the outcome is not likely
from the expected value. The variance is found by
to deviate too much from the expected value.
taking the difference between the expected value and
(( X − µ) ) = ∫
+∞
σ2 = V ( X ) = E ( x − µ) f ( x ) dx
each possible outcome, squaring that difference, 2 2
multiplying that square by the probability of the −∞
outcome, and then summing these for each possible
outcome. The expected value is more useful as a Note that the limits of integration may be adjusted so
prediction when the outcome is not likely to deviate long as they continue to include the sample space.
too much from the expected value. If the integral fails to converge, the variance does
(( X − µ) ) = ∑ ( x − µ ) m ( x )
not exist.
σ2 = V ( X ) = E
2 2
x
The variance of a uniform distribution on [0,1] is 1/12.
The variance of an exponential distribution is 1/λ2.
For discrete random variables, the variance can be found by
a couple of methods: X = random variable: the observation of an experimental
outcome
∑ ( x − µ) m ( x)
2
Method 1: x
Find the expected value µ. µ = the expected value of X, E(X)
Subtract µ from each possible outcome. Square each of
these results. Multiply each result by its probability and
then sum all of these. V(X), σ2 PROPERTIES OF VARIANCE
For example, the variance of the roll of a die is p.259
2 2 2 2 2
[(1-3.5) +(2-3.5) +(3-3.5) +(4-3.5) +(5-3.5) +(6-3.5) ](1/6)=35/12. 2
V ( X + Y ) = V ( X ) + V (Y )
Method 2: E ( X ) − µ Multiply the probability of each
V ( cX ) = c 2V ( X ) V ( X + c) = V ( X )
2 2
outcome by the square of the outcome. Sum the results

V ( X ) = E ( X 2 ) − µ2
to get E(X2). Find the expected value µ. Subtract the
square of µ from E(X2).
For example, for the roll of a die,
 1  1   1   1 1  1  91
E ( X 2 ) = 1  + 4   + 9   + 16   + 25   +36   =
 6  6   6 
2
 6 6 6 6 , then D(X), σ STANDARD DEVIATION p.257
91  7  35
E( X )−µ = −  =
2 2
6  2  12 . The standard deviation of X is the square root of the

variance and is sometimes written σ.
The variance of a Bernoulli Trials process is npq.
X = numerically-valued discrete random variable
µ = the expected value of X, E(X)
D( X ) = V ( X ) = E (( X − µ ) )
2
m(x) = discrete distribution function

The standard deviation of a Bernoulli Trials process is
σ = npq
outcome
V(X) = the variance of X

cov(X,Y) COVARIANCE p.280
CONVOLUTION
The book doesn’t go into detail about this. SUM OF RANDOM VARIABLES p.285, 291
Covariance applies to both discrete and continuous Discrete: Given Z = X + Y, where X and Y are
random variables. independent discrete random variables with
cov ( X , Y ) = E ( X − µ ( X ) ) (Y − µ ( Y ) ) distribution functions m1(x) and m2(y), we can find the
distribution function m3(z) of Z using convolution.
= E ( XY ) − µ ( X ) µ (Y ) m3 = m1 ∗ m2
Property of covariance: m3 ( z ) = ∑ m1 ( k ) ⋅ m2 ( z − k )
V ( X + Y ) = V ( X ) + V (Y ) + 2cov ( X , Y ) k
X = random variable: the observation of an experimental Continuous: Given Z = X + Y, where X and Y are
outcome independent continuous random variables with density
V(X) = the variance of X functions f(x) and g(y), we can find the density function
µ = the expected value of X, E(X) h(z) of Z using convolution. Note that we are talking
density functions here where it was distribution
functions where discrete random variables were
ρ (X,Y) CORRELATION p.281 concerned. Also note that the limits of integration may
be adjusted for density functions that do not extend to
The book doesn’t go into detail about this either.
infinity.
Correlation applies to continuous random variables.
+∞
Another text calls this the correlation coefficient and f ( x) ∗ g ( y) = h ( z) = ∫ f ( z − y ) g ( y ) dy
has a separate function for discrete random variables −∞
+∞
which it calls correlation. =∫ g ( z − x ) f ( x ) dx
cov ( X , Y ) −∞
ρ ( X ,Y ) =
V ( X ) V (Y ) For more about the sum of random variables, see Properties
of Generating Functions p20.
k = represents all of the integers for which the probabilities
m1(k) and m2(z-k) exist. (In cases where the probability
outcome
doesn’t exist, the probability is zero.)

CONVOLUTION EXAMPLE LAW OF LARGE NUMBERS
Suppose the distribution functions m1(x) and m2(y) for LAW OF LARGE NUMBERS p.305
the discrete random variables X and Y are
Also called the Law of Averages, the law of large
 0 1 2  numbers is the first fundamental theorem of
m1 ( x ) = m2 ( x ) =   probability. It is sometimes called the Weak Law of
1/8 3 / 8 1/ 2  Large Numbers to distinguish it from the Strong Law
of Large Numbers. Probability may be viewed 1)
Given Z = X + Y, we can find the distribution function intuitively, as the frequency at which an outcome
m3(z) of Z using convolution. occurs over the long run, and 2) mathematically, as a
m3 ( z ) = m1 ( x ) ∗ m2 ( y ) = ∑ m1 ( k ) ⋅ m2 ( z − k ) value of the distribution function for the random
k
variable representing the experiment. The law of
large numbers theorem shows that these two
For each possible value of k we have interpretations are consistent.
1 1 1
P ( z = 0) = ⋅ = , k=0 Let X1, X2, … Xn be an independent trials process with
8 8 64 finite expected value µ = E(Xj) and finite variance
1 3 3 1 3 σ2 = V(Xj). Let Sn = X1 + X2 + … + Xn. Then for any
P ( z = 1) = ⋅ + ⋅ = , k = 0, 1 ε > 0,
8 8 8 8 32
S 
1 1 3 3 1 1 17 P  n − µ ≥ ε → 0 as n → ∞
P ( z = 2) = ⋅ + ⋅ + ⋅ = , k = 0, 1, 2  n 
8 2 8 8 2 8 64
3 1 1 3 3 S 
P ( z = 3) = ⋅ + ⋅ = , k = 1, 2 P n −µ < ε →1 as n → ∞
8 2 2 8 8  n 
1 1 1 Sn
P ( z = 4) = ⋅ = , k = 2 Note that is the average of the individual outcomes, so
2 2 4 n
Sn
Therefore −µ is the average deviation.
n
 0 1 2 3 4 
m3 ( z ) =   In other words, if we conduct a lot of trials, the average
 1/ 64 3 / 32 17 / 64 3 / 8 1/ 4  result will be really close to the expected value.
Sn = the sum of the random variables
n = the number of possible outcomes or the number of
random variables
ε = any positive real number
CHEBYSHEV INEQUALITY p.305,316
Let X be a discrete random variable with expected

value µ = E(X), and let ε > 0 be any positive real
number. Then
V (X)
P ( X − µ ≥ ε) ≤
ε2
In other words, the probability that the outcome differs from
the expected value by an amount greater than or equal to
the value ε is not greater than the variance divided by the
square of ε.

outcome
ε = any positive real number

CENTRAL LIMIT THEOREM CENTRAL LIMIT THEOREM FOR
CENTRAL LIMIT THEOREM p.325 BERNOULLI TRIALS p.330
The second fundamental theorem of probability is the Where Sn is the number of successes in n Bernoulli
Central Limit Theorem. This theorem says that if Sn is trials (Bernoulli trials have 2 possible outcomes).
the sum of n mutually independent random variables, Note that a* and b* are standardized values:
then the distribution function of Sn is well-approximated
a − np b − np
by a certain type of continuous function known as the a* = b* =
normal density function, given by the formula npq npq
1
fX ( x) = e −( −µ) σ see p8.
2
x /2 2
lim P ( a ≤ S n ≤ b ) = ∫ φ ( x ) dx *
b*
σ 2π n →∞ * a
1 − x2 / 2
σ = the deviation where φ ( x ) = e
σ2 = the variance 2π
*For some reason, there is a big problem when performing
this integration. The table of values in the next box are for
areas under the curve of φ(x) and may be used as a close
Sn* STANDARDIZED SUM OF Sn p.326 approximation instead of performing the integration. For
The standardized sum always has the expected value example, for the integration from a* = -.2 to b* = .3, find the
values of NA(z) for z = .2 and z = .3 in the table and add
0 and variance 1. A sum of variables is standardized them together to get .1942. Note that in this case the
by subtracting the expected number of successes and values were added because they represented areas on
dividing by its standard deviation. each side of the mean (center). In the case where both
values were on the same side of the mean (both have the
S n − np S n − nµ
S n* = or S n =
* same sign), a subtraction would have to take place to find
the desired area. That is because NA(z) is the area
jpq nσ 2 bounded by z and the mean. Refer to the figure below.
CENTRAL LIMIT THEOREM FOR

BINOMIAL DISTRIBUTIONS p.328
For the binomial distribution b(n,p,j) we have
n →∞
(
φ ( x ) = lim npq b n, p, np + x npq ) a = lower bound
b = upper bound
φ(x) = standard normal density function
φ(x) = standard normal density n = number of trials or selections
n = number of trials or selections p = probability of success
p = probability of success q = probability of failure (1-p)
q = probability of failure (1-p)
TABLE OF VALUES FOR NA(0,z) p.331

The area under the normal density curve from 0 to z.
z NA(z) z NA(z) z NA(z) z NA(z)
.0 .0000 1.0 .3413 2.0 .4772 3.0 .4987
.1 .0398 1.1 .3643 2.1 .4821 3.1 .4990
.2 .0763 1.2 .3849 2.2 .4861 3.2 .4993
.3 .1179 1.3 .4032 2.3 .4893 3.3 .4995
.4 .1554 1.4 .4192 2.4 .4918 3.4 .4997
.5 .1915 1.5 .4332 2.5 .4938 3.5 .4998
.6 .2257 1.6 .4452 2.6 .4953 3.6 .4998
.7 .2580 1.7 .4554 2.7 .4965 3.7 .4999
.8 .2881 1.8 .4641 2.8 .4974 3.8 .4999
.9 .3159 1.9 .4713 2.9 .4981 3.9 .5000

CENTRAL LIMIT THEOREM FOR THE GENERATING FUNCTIONS
SUM OF DISCRETE VARIABLES p.343 g(t) GENERATING FUNCTIONS p.365
Where Sn is the sum of n discrete random variables: A generating function g(t) produces the moments of a
random variable X. The first moment of g(t) is the
 S − nµ  1 b − x2 / 2
2π ∫a
lim P  a < n < b = e dx mean; the variance may be determined from the first
n →∞
 nσ 2
 and second moments of g(t); and knowledge of all of
the moments determines the distribution function
S n − nµ completely. So knowing the generating function
Note that = S n* . See Standardized Sum p.17. provides more information than knowing the mean
nσ 2
and variance only. The moments of g(t) are its
Note also that a and b will have to be similarly derivatives for t = 0. So g(t) may be called the
standardized before applying the Table of Values for
moment generating function for X.
NA(z) that appears previous.
∞
a = lower bound
b = upper bound
Discrete: g ( t ) = E e( tX
) = ∑e j p(xj ) tx
j =1
( )=∫
+∞
Continuous: g ( t ) = E e etx f X ( x ) dx
tX
−∞
CENTRAL LIMIT THEOREM –
Uniform Density: g ( t ) = E e ( ) = b −1 a ∫
b
GENERAL FORM p.343 tX
etx dx
a
Where Sn is the sum of n discrete random variables,
and we assume that the deviation of this sum
Note that the limits of integration are the range of the
approaches infinity sn→∞:
random variable X and are not necessarily infinite.
Moments may also be calculated directly; see the next
 S − mn  1 b − x2 / 2
2π ∫a
lim P  a < n < b = e dx box.
n →∞
 sn  t = just some variable we need in order to have a generating
mn = the mean of Sn function
sn = the deviation of Sn (square root of the variance) j = a counting variable (integer) for the dummy variable x
a = lower bound x = dummy variable, I think
b = upper bound fX(x) = density function for the random variable X
APPROXIMATION THEOREM p.342
For n large:
φ( xj ) 1 −
( j −nµ )2
P ( Sn = j ) : = e 2 nσ2
nσ 2
2πnσ 2
where xj =
( j − nµ )
nσ2
φ(x) = standard normal density
p = probability of success
σ2 = the variance

un MOMENTS p.366, 394 g(t) SPECIFIC GENERATING FUNCTIONS
p.366
The moments are the derivatives of the generating
function evaluated at t = 0. They describe the mean, Following are some distribution functions and their
variance, and distribution functions of a random generating functions.
variable. A moment is determined by differentiating Uniform distribution for 1≤ j ≤ n
et ( e nt − 1)
the generating function n times and then setting t = 0.
The moments of a generating function give useful 1
pX ( j ) = g (t ) =
n ( et − 1)
information; for example the first moment (n = 1) is the
mean of the random variable for t = 0.
n
µ = ( n + 1) / 2 σ2 = ( n 2 − 1) /12
µn = E ( X ) = n g ( t )
dn n
dt t =0 Binomial distribution for 0≤ j ≤ n
Discrete: µ n =
∞
∑(x ) P( X = x )
j
k
j
j()
p X ( j ) = n p j q n− j g ( t ) = ( pet + q )
n
j =1 µ = np σ2 = np (1 − p )
( )=∫
+∞
Continuous: µ n = E X
n
x n f X ( x ) dx Geometric distribution for all j
−∞
pet
p X ( j ) = q j −1 p g (t ) =
Mean: µ = µ1 for t = 0 1 − qet
µ = 1/ p σ2 = q / p2
Variance: σ2 = µ 2 − µ12 for t = 0
Poisson distribution with mean λ for all j
Sanity check: µ0 = 1 for t = 0
e −λ λ j ( )
pX ( j ) =
λ et −1
t = just some variable we need in order to have a generating g (t ) = e
j!
function
k = a dummy counting variable (integer) for the moment µ=λ σ2 = λ
calculation
n = a counting variable (integer) for the moments, where outcome
st nd
n = 1 for the 1 moment, n = 2 for the 2 moment, etc.
t = just some variable we need in order to have a generating
function
j = a counting variable (integer) for the dummy variable x
x = dummy variable, I think
h(z) ORDINARY GENERATING

FUNCTION p.370
Here are the definitions of h(z), but basically to get the

ordinary generating function, find g(t) and replace et by
z, replace e2t by z2, etc., and leave everything else
alone.
n
h ( z ) = g ( log z ) = ∑ z j p ( j )
j=0
z = just some variable we need in order to have a generating

function
j = a counting variable (integer) for the dummy variable z
p(j) = coefficient of zj in h(z)

PROPERTIES OF GENERATING MARKOV CHAINS
FUNCTIONS p.371 STATES p.405
For Y = X + a : (
gY ( t ) = E ( etY ) = E et ( X + a ) ) A Markov chain is composed of various states with
defined paths of movement between states and
= eta E ( etX ) = eta g X ( t ) associated probabilities of movement along these
paths. Permissible paths from one state to another
gY ( t ) = E ( etY ) = E ( etbX )
are called steps.
For Y = bX :
For example, let's say in the Land of Oz, there are never 2
= g X ( bt ) nice days in a row. When they have a nice day, the
following day will be rain or snow with equal probability.
gY ( t ) = E ( etY ) = E e ( ( )
t bX + a ) When they have snow or rain, there is a 50% chance that
For Y = bX + a : the following day will be the same and an equal chance of
the other two possibilities. So the states look like this.
= eta E ( etbX ) = eta g X ( bt ) 1 1
1
4
4 2
X −µ t 1 1
: g x* ( t ) = e
RAIN NICE SNOW
−µt / σ
For X * =
2 2
gX  
σ σ 1
1
2
1
4
4
For Z = X + Y : (
g Z ( t ) = E ( etZ ) = E et ( X +Y ) )
= E ( etX ) E ( etY ) = g X ( t ) gY ( t )
TRANSITION MATRIX p.406
A transition matrix or P-matrix is an arrangement of

therefore g Z ( t ) = g X ( t ) gY ( t ) all of the probabilities of moving between states. So
hZ ( z ) = hX ( z ) hY ( z )
pij is the probability of moving from state i to state j in
also one step.
For t = 0 : g (t ) = 1 For the example above, the transition matrix is
rain nice snow
rain  1/ 2 1/ 4 1/ 4 
p(j) COEFFICIENTS OF THE ORDINARY
P = nice  1/ 2 0 1/ 2 
GENERATING FUNCTION p.370  
 1/ 4 1/ 4 1/ 2 
This is defined by Taylor’s formula:
snow  
So the values in the first row represent the probabilities of
p( j) =
h ( j)
( 0) the weather following a rainy day, etc. Notice that the rows
each sum to 1 but the columns do not. We can use the
j! terminology p12 to mean the probability of having a nice day
(2) after a rainy day (1). We can read the result from
element p12 of the matrix.
1 1 1
For example, if h( z) = + z + z2 then p has values
4 2 4
of {1/4,1/2,1/4}. MATRIX POWERS p.406
z = just some variable we need in order to have a generating The above P-matrix raised to the second power gives
function us 2nd day probabilities, raised to a power of 3 gives
j = a counting variable (integer) for the dummy variable z rd
us 3 day probabilities, etc.
h(z) = ordinary generating function
 .438 .188 .375 
P =  .375 .250 .375 
2
 .375 .188 .438 

 
We use the notation p12(2) to mean the probability of having a
nice day 2 days after a rainy day, e.g. prain nice(#days) = (.188).

ABSORBING CHAINS p.415 FUNDAMENTAL MATRIX OF AN
A Markov chain is absorbing if there are one or more ABSORBING CHAIN p.418
states from which it is not possible to leave and it is The fundamental matrix of an absorbing chain, or the
possible to get to one of these states from any state in N-matrix, gives additional information. The value nij of
the chain. the N-matrix is the expected number of times the
.4 .4 chain will be in state j given that it begins in state i.
N = ( I − Q)
−1
1 1 2 3 4 1
.6 .6
From our example P-matrix we have
A state that is not absorbing (states 2 and 3 in this example)
is called a transient state. The P-matrix for the example −1 2 3
above is  1 0   0 .4  
N =  −   = 2  1.32 .526 
1 2 3 4
 0 1   .6 0    
3  .789 1.32 
1 1 0 0 0
 .6 0 .4 0  Q = a submatrix extracted from the P-matrix canonical form
P=
2
  and used to obtain the fundamental matrix
3  0 .6 0 .4  I = an identity matrix
 
4 0 0 0 1
TIME TO ABSORPTION p.419
The time to absorption or t-matrix, gives the number of

CANONICAL FORM p.417 expected steps before the chain is absorbed. In other
Using the P-matrix in the previous box as an example, words, given that the chain begins in state i, we can
reorder the rows and columns so that the transient expect absorption to occur in ti steps.
states are listed first. t = Nc
2 3 1 4
From our example N-matrix of the previous box we have
2  0 .4 .6 0 
 .6 0 0 .4 
Number of
steps to
  678
3 absorption
P=
1  0 0 1 0  1.32 .526     1.84 
1 2
  t = ×   =  
4 0 0 0 1  .789 1.32  1 3  2.11
Note that we have submatrices defined as N = the fundamental matrix
Q R  c = a column matrix of ones
P=
 0 I 
where
  PROBABILITY OF ABSORPTION p.420
Q = a matrix to be used later to find the fundamental matrix
N The probability of absorption or B-matrix, gives the
R = a matrix to be used later to find the probability of probability that the chain is absorbed in state j given
absorption matrix B that it began in state i.
0 = a matrix of zeros B = NR
I = an identity matrix
Note that in this particular example, the 4 matrices are all From our example N-matrix of the previous box we have
the same size, but this is not always the case.
1 4
 1.32 .526   .6 0 
B= ×  = 2  .789 .211 
 .789 1.32   0 .4   
3  .474 .526 
N = the fundamental matrix
R = a submatrix of the canonical form

REGULAR MARKOV CHAIN p.433 SOLVING FOR w p.436
A Markov chain is called a regular chain if some
power of the transition matrix has only positive Since wP = w and w1 + w2 + w3 = 1 (assuming a
elements. In other words, for some n, it is possible to 3×3 P-matrix), we have 4 equations and 3 unknowns
go from any state to any state in exactly n steps. w1 + w2 + w3 = 1
Every regular chain is also ergodic.
Pi1w = p11w1 + p21w2 + p31w3 = w1
ERGODIC MARKOV CHAIN p.433 Pi 2 w = p12 w1 + p22 w2 + p32 w3 = w2
A Markov chain is called an ergodic chain if it is
possible to go from every state to every other state Pi 3 w = p13 w1 + p23 w2 + p33 w3 = w3
(not necessarily in one move). Ergodic chains are
sometimes called irreducible. P = the transition matrix
pij = the element from row i and column j of the transition
matrix
W FIXED PROBABILITY MATRIX p.434 w = the fixed probability vector
As the transition matrix of a regular Markov chain is wj = the element from column j of any row of limiting matrix
raised to a higher power, the result tends toward a W
matrix of common rows called the fixed probability
matrix. Sometimes you can use a calculator and
Z FUNDAMENTAL MATRIX OF AN
raise P to a power of about 12 to see what it goes to
as n gets large. Other times this doesn’t work (all ERGODIC CHAIN p.456
rows of W are not equal) and you have to use the As with absorbing chains, the fundamental matrix of
method of Solving For w in the next box. an ergodic chain leads to useful information, but is
W = lim P n found in a different way
Z = ( I −P + W)
n→∞ −1
If we define w as one of the common rows of W, then

wP = w and Pc = c P = the transition matrix
w is called the fixed probability vector. The elements of w
will all be positive and will sum to one. The fact that all rows W = the fixed probability matrix
of W are the same means that the probability of arriving at a
particular state after many steps is the same regardless of
the starting point. MEAN FIRST PASSAGE MATRIX OF AN
ERGODIC CHAIN p.459
w (P − I ) = 0
The mean first passage matrix gives the expected
number of steps from an initial state to a destination
P = the transition matrix state. The mean first passage matrix is denoted by
c = a column matrix of ones the letter M and is found one element at a time using
w = the fixed probability vector the following formula
z jj − zij
mij =
wj
mij = an element of the mean first passage matrix

z = an element of the fundamental matrix
wj = an element of the fixed probability vector

SOME SAMPLE/CLASSIC PROBLEMS A CARD PROBLEM
MEDICAL PROBABILITIES A Gin hand of 10 cards is dealt. What is the
probability that 4 cards belong to one suit, and there
A drug is thought to be effective with probability x
are 3 cards in each of two other suits?
each time it is used. A beta function can be estimated
to fit the probability and expected density (see Beta
 4  3 13  13  13 
Density Function p12). A new probability can be       
given data from more recent trials and successes.  1  2  4  3   3  = 0.044
Given α and β, and subsequent knowledge that there  52 
have been i successes in n new events, the  
probability of an event being successful is  10 
The equation reads, “from 4 suits choose 1 suit, from
α+i
P ( success ) = the remaining 3 suits choose 2 suits, from one suit of
α+β+n 13 cards choose 4 cards, from another suit choose 3
cards, and from another suit choose 3 cards. Divide
the product of these by the number of 10-card hands
THE ENVELOPE PROBLEM possible from a deck of 52 cards.”
This is also called the hat check problem. n letters are
randomly inserted into n addressed envelopes. What
is the probability that no letter will be put into the
correct envelope?
The probability that the first letter is put into the
correct envelope is 1/n. Given that the first has been
placed in the proper envelope, the probability that the
second one is put into the correct envelope is 1/(n-1)
and so on. So the probability that all are put into the
correct envelopes is the product of the individual
probabilities or
1
P ( E1 ∩ E2 ∩ L ∩ En ) =
n!
But the question was what is the probability that NO
letter will be put into the correct envelope. To make a
long story short, this turns out to be
1 1 n 1
P ( no letter in correct envelope ) = − + L ( −1)
2! 3! n!
THE BIRTHDAY PROBLEM

Given r people, what is the probability that there are at
least two with the same birthday? It is easier to find
the probability that no two will have the same birthday
and subtract that from one.
Considering the first person, he could have any of 365
birthdays. Then the second person could only have
one of 364 birthdays since one had been taken by the
first person. The third person could have one of the
363 unused birthdays, etc. The sample space
consists of all of the possible combinations of
birthdays that the group could have.
( )
P some share = 1 −
a birthday
365 ⋅ 364 ⋅K ⋅ ( 365 − r + 1)
365r

GENERAL MATHEMATICAL
EULER'S EQUATION CALCULUS - INTEGRATION
e jφ = cos φ + j sin φ x n +1
∫ dx = x + C ∫ x dx = n + 1 + C
n
1 u
∫ e dx = u′ ⋅ e + C ∫ xe dx = ( x − 1) e + C
TRIGONOMETRIC IDENTITIES u x x
e + jθ + e − jθ = 2cos θ
e + jθ − e − jθ = j 2sin θ e ax
∫ xe dx =
ax
( ax − 1) + C
e ± jθ = cos θ ± j sin θ a2
( n − 1) / 2 !
for odd n
LOGARITHMS 2a ( n + )
∞ 1 /2
∫
n − ax 2
xe dx =
ln x = b ↔ e b = x ln x y = y ln x 0
1⋅ 3 ⋅ 5L ( n − 1) π
for even n
ln e x = x e a ln b = b a 2( ) + a( )
n/2 1 n/ 2
a
log a x = y ↔ a y = x 1 1
∫ x dx = ln x + C ∫ a dx = ln a a +C
x x
CALCULUS – L’HÔPITOL’S RULE ∫ sin u du = 12 u − 14 sin 2u + C

2
If the limit of f(x)/g(x) as x approaches c produces the

indeterminate form 0/0, ∞ / ∞, or −∞ / ∞, then the
∫ cos u du = 12 u + 14 sin 2u + C
2
derivative of both numerator and denominator may be

taken
f ( x) f ′(x)
Integration by parts: ∫ u dv = uv − ∫ v du
lim = lim
x →c g ( x ) x →c g ′ ( x )
FACTORIAL
provided the limit on the right exists or is infinite. The
derivative may be taken repeatedly provided the n! is the number of ways a collection of n objects can
numerator and denominator get the same treatment. be ordered. See also Stirling Approximation.
To convert a limit to a form on which L'Hôpital's Rule
can be used, try algebraic manipulation or try setting y STIRLING APPROXIMATION
equal to the limit then take the natural log of both
sides. The ln can be placed to the right of lim. This Useful in calculating large factorials.
is manipulated into fractional form so L'Hôpital's Rule
can be used, thus getting rid of the ln. When this n ! ; nne− n or n ! = n n e− n 2 πn
limit is found, this is actually the value of ln y where
y is the value we are looking for.
Other indeterminate forms (which might be ex INFINITE SUM
convertible) are 1∞ , ∞ , 0 , 0⋅ ∞, and ∞−∞. Note that
0 0
Useful in the subject of probability.
0∞ = 0 ∞
xk
e =∑
x
k =0 k !
CALCULUS - DERIVATIVES
d u v ⋅ u′ − u ⋅ v′
=
dx
v v2 e-x ANOTHER e THING
n
dx a = a ln a
d x x d u
′ x
dx a = u ⋅ a ln a
 x −x
As n → ∞, 1 −  → e
d
e = u′ ⋅ e
u u  n
dx
1 u′
d
dx ln x = d
dx ln u =
x u

SERIES SPHERE
n ( n − 1) Area = πd = 4πr
2
Volume = 16 πd 3 = 43 πr 3
2
= 1 + 2 + 3 + L + ( n − 1)
2
1
1+ x ; 1+ x , x = 1 GRAPHING TERMINOLOGY
2 With x being the horizontal axis and y the vertical, we have
a graph of y versus x or y as a function of x. The x-axis
1 x 3 x 2 5 x 3 35 x 4
; 1− + − + − L , − 12 < x < 12 represents the independent variable and the y-axis
1+ x 2 8 16 128 represents the dependent variable, so that when a graph
is used to illustrate data, the data of regular interval (often
1
; 1 + x 2 + x 4 + x6 + L , − 12 < x < 12 this is time) is plotted on the x-axis and the corresponding
1− x 2 data is dependent on those values and is plotted on the y-
axis.
1
; 1 + 2 x + 3 x 2 + 4 x 3 + L , − 12 < x < 12
(1 − x )
2
GLOSSARY
1 derangement A permutation of elements in which the position
; 1 − x + x 2 − x 3 + L , − 12 < x < 12 of all elements have changed with respect to a reference
1+ x permutation.
1 distribution function A distribution function assigns
; 1 + x + x 2 + x 3 + L , −1 < x < 1 probabilities to each possible outcome. The sum of the
1− x probabilities is 1. The probability density function may be
obtained by taking the derivative of the distribution function.
x
= x + x 2 + x 3 + x 4 + L , −1 < x < 1 independent trials A special class of random variables. A
1− x sequence of random variables X1, X2, … Xn that are mutually
independent and that have the same distribution is called a
x2 x3 x 4
ex = 1 + x + + + +L sequence of independent trials or an independent trials
2 3! 4! process.
x ( x + 1)
median A value of a random variable for which all greater
= 1 + 2 + 3 +L + n values make the distribution function greater than one half
and all lesser values make it less than one half. Or, a value
2 in an ordered set of values below and above which there is
x ( x + 1)( 2 x + 1) 2 2 2 an equal number of values.
= 1 + 2 + 3 + L + n2 random variable A variable representing the outcome of a
6 particular experiment. For example, the random variable X1
might represent the outcome of two coin tosses. It's value
could be HT or HH, etc.
BINOMIAL THEOREM stochastic Random; involving a random variable; involving
Also called binomial expansion. When m is a positive chance or probability.
integer, this is a finite series of m+1 terms. When m is uniform distribution The probabilities of all outcomes are
not a positive integer, the series converges for -1<x<1. equal. If the sample space contains n discrete outcomes
m ( m − 1) m ( m − 1)( m − 2 )L ( m − n + 1) numbered 1 through n, then the uniform distribution function
(1 + x ) = 1 + mx + x2 +L + x n +L
m
2! n! is m(ω) = 1/n.
QUADRATIC EQUATION
GIven the equation ax + bx + c = 0 .
2
− b ± b 2 − 4ac
x=
2a
LINEARIZING AN EQUATION
Small nonlinear terms are removed. Nonlinear terms
include:
• variables raised to a power
• variables multiplied by other variables
∆ values are considered variables, e.g. ∆t.

Probability Statistics Random Proc

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability Statistics Random Proc

Uploaded by

Copyright:

Available Formats

PROBABILITY, STATISTICS, AND RANDOM PROCESSES EE 351K

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 1 of 25

PROBABILITY MULTIPLE COIN TOSS

SINGLE COIN TOSS }

2 The probability of getting exactly 3 heads:

There are four possible outcomes with equal }

probability.  getting exactly  n ( n − 1)( n − 2 ) ⋅ 1

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 2 of 25

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 3 of 25

Plot of the density functions for normal random UNEVEN PROBABILITY

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 4 of 25

MULTINOMIAL COEFFICIENT 1/ ( l − k + 1) , x = k , k + 1, k + 2,K , l

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 5 of 25

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 6 of 25

GEOMETRIC DISTRIBUTION p.184 An approximation of a discrete probability distribution.

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 7 of 25

a spread of the density

The case of the normal density function with

Plot of the standard normal density function

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 8 of 25

Where X and Y are continuous random variables, the

right with λ = 2 (high λ=2

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 9 of 25

How many ways can you have 3 of a kind? Consider that

For example, let X be the outcome of rolling a die once. Let

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 10 of 25

f(x) = the density function for random variables Xi

Bayes probabilities are used for medical diagnosis. Given a

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 11 of 25

provided the sum converges. For example, the expected

If the probability p is the same for each of n possible

 1  α−1 For a uniform density problem (p.9), the expected value is

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 12 of 25

again, with the provision that the sum converges.

X = numerically-valued discrete random variable with

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 13 of 25

outcome by the square of the outcome. Sum the results

6  2  12 . The standard deviation of X is the square root of the

m(x) = discrete distribution function

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 14 of 25

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 15 of 25

CHEBYSHEV INEQUALITY p.305,316

Let X be a discrete random variable with expected

X = random variable: the observation of an experimental

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 16 of 25

CENTRAL LIMIT THEOREM FOR

TABLE OF VALUES FOR NA(0,z) p.331

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 17 of 25

APPROXIMATION THEOREM p.342

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 18 of 25

dt t =0 Binomial distribution for 0≤ j ≤ n

h(z) ORDINARY GENERATING

Here are the definitions of h(z), but basically to get the

z = just some variable we need in order to have a generating

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 19 of 25

A transition matrix or P-matrix is an arrangement of

 .375 .188 .438 

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 20 of 25

The time to absorption or t-matrix, gives the number of

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 21 of 25

If we define w as one of the common rows of W, then

mij = an element of the mean first passage matrix

Tom Penick tom@tomzap.com www.teicontrols.com/notes ProbabilityStatisticsRandomProc.pdf 5/18/2001 Page 22 of 25