Professional Documents
Culture Documents
Mario A. T. Figueiredo
Instituto Superior Tecnico
&
Instituto de Telecomunicacoes
Lisboa, Portugal
M
ario A. T. Figueiredo (IST & IT)
1 / 35
Probability theory
M
ario A. T. Figueiredo (IST & IT)
2 / 35
Probability theory
The study of probability has roots in games of chance (dice, cards, ...)
M
ario A. T. Figueiredo (IST & IT)
2 / 35
Probability theory
The study of probability has roots in games of chance (dice, cards, ...)
Great names of science: Cardano, Fermat, Pascal, Laplace,
Kolmogorov, Bernoulli, Poisson, Cauchy, Boltzman, de Finetti, ...
M
ario A. T. Figueiredo (IST & IT)
2 / 35
Probability theory
The study of probability has roots in games of chance (dice, cards, ...)
Great names of science: Cardano, Fermat, Pascal, Laplace,
Kolmogorov, Bernoulli, Poisson, Cauchy, Boltzman, de Finetti, ...
Natural tool to model uncertainty, information, knowledge, belief, ...
M
ario A. T. Figueiredo (IST & IT)
2 / 35
Probability theory
The study of probability has roots in games of chance (dice, cards, ...)
Great names of science: Cardano, Fermat, Pascal, Laplace,
Kolmogorov, Bernoulli, Poisson, Cauchy, Boltzman, de Finetti, ...
Natural tool to model uncertainty, information, knowledge, belief, ...
...thus also learning, decision making, inference, ...
M
ario A. T. Figueiredo (IST & IT)
2 / 35
What is probability?
Classical definition: P(A) =
NA
N
Laplace, 1814
M
ario A. T. Figueiredo (IST & IT)
3 / 35
What is probability?
Classical definition: P(A) =
NA
N
Laplace, 1814
NA
N
M
ario A. T. Figueiredo (IST & IT)
3 / 35
What is probability?
Classical definition: P(A) =
NA
N
Laplace, 1814
NA
N
de Finetti, 1930s
3 / 35
M
ario A. T. Figueiredo (IST & IT)
4 / 35
An event is a subset of X
Examples:
I
odd number in the roulette: B = {1, 3, ..., 35} {1, 2, ..., 36}.
M
ario A. T. Figueiredo (IST & IT)
4 / 35
M
ario A. T. Figueiredo (IST & IT)
5 / 35
M
ario A. T. Figueiredo (IST & IT)
5 / 35
P(X ) = 1
M
ario A. T. Figueiredo (IST & IT)
5 / 35
P(X ) = 1
[ X
If A1 , A2 ... X are disjoint events, then P
Ai =
P(Ai )
i
M
ario A. T. Figueiredo (IST & IT)
5 / 35
P(X ) = 1
[ X
If A1 , A2 ... X are disjoint events, then P
Ai =
P(Ai )
i
P() = 0
C D P(C ) P(D)
M
ario A. T. Figueiredo (IST & IT)
5 / 35
M
ario A. T. Figueiredo (IST & IT)
P(A B)
(conditional prob. of A given B)
P(B)
6 / 35
P(A B)
(conditional prob. of A given B)
P(B)
P(X |B) = 1
IfA[
X are disjoint, then
1 , A2, ...
X
P(Ai |B)
P
Ai B =
i
M
ario A. T. Figueiredo (IST & IT)
6 / 35
P(A B)
(conditional prob. of A given B)
P(B)
P(X |B) = 1
IfA[
X are disjoint, then
1 , A2, ...
X
P(Ai |B)
P
Ai B =
i
M
ario A. T. Figueiredo (IST & IT)
6 / 35
M
ario A. T. Figueiredo (IST & IT)
P(A|B) =
P(A B)
P(B)
7 / 35
P(A|B) =
P(A B)
P(B)
M
ario A. T. Figueiredo (IST & IT)
7 / 35
P(A|B) =
P(A B)
P(B)
M
ario A. T. Figueiredo (IST & IT)
7 / 35
P(A|B) =
P(A B)
P(B)
M
ario A. T. Figueiredo (IST & IT)
7 / 35
P(A|B) =
P(A B)
P(B)
M
ario A. T. Figueiredo (IST & IT)
7 / 35
P(A|B) =
P(A B)
P(B)
7 / 35
Bayes Theorem
Law of total probability: if A1 , ..., An are a partition of X
P(B) =
P(B|Ai )P(Ai )
P(B Ai )
M
ario A. T. Figueiredo (IST & IT)
8 / 35
Bayes Theorem
Law of total probability: if A1 , ..., An are a partition of X
P(B) =
P(B|Ai )P(Ai )
P(B Ai )
P(B Ai )
P(B|Ai ) P(Ai )
=X
P(B)
P(B|Ai )P(Ai )
i
M
ario A. T. Figueiredo (IST & IT)
8 / 35
Random Variables
A (real) random variable (RV) is a function: X : X R
M
ario A. T. Figueiredo (IST & IT)
9 / 35
Random Variables
A (real) random variable (RV) is a function: X : X R
M
ario A. T. Figueiredo (IST & IT)
9 / 35
Random Variables
A (real) random variable (RV) is a function: X : X R
M
ario A. T. Figueiredo (IST & IT)
9 / 35
Random Variables
A (real) random variable (RV) is a function: X : X R
M
ario A. T. Figueiredo (IST & IT)
9 / 35
Random Variables
A (real) random variable (RV) is a function: X : X R
M
ario A. T. Figueiredo (IST & IT)
9 / 35
M
ario A. T. Figueiredo (IST & IT)
10 / 35
M
ario A. T. Figueiredo (IST & IT)
10 / 35
10 / 35
M
ario A. T. Figueiredo (IST & IT)
11 / 35
M
ario A. T. Figueiredo (IST & IT)
11 / 35
M
ario A. T. Figueiredo (IST & IT)
11 / 35
11 / 35
M
ario A. T. Figueiredo (IST & IT)
12 / 35
M
ario A. T. Figueiredo (IST & IT)
12 / 35
M
ario A. T. Figueiredo (IST & IT)
12 / 35
M
ario A. T. Figueiredo (IST & IT)
12 / 35
M
ario A. T. Figueiredo (IST & IT)
c
LxMLS 2014: Probability Theory
12 / 35
M
ario A. T. Figueiredo (IST & IT)
c
LxMLS 2014: Probability Theory
12 / 35
1
ba
x [a, b]
x
6 [a, b]
(previous slide).
M
ario A. T. Figueiredo (IST & IT)
13 / 35
1
ba
x [a, b]
x
6 [a, b]
(previous slide).
Gaussian: fX (x) = N (x; , 2 ) =
M
ario A. T. Figueiredo (IST & IT)
1
2 2
(x)2
2 2
13 / 35
1
ba
x [a, b]
x
6 [a, b]
(previous slide).
Gaussian: fX (x) = N (x; , 2 ) =
Exponential: fX (x) = Exp(x; ) =
M
ario A. T. Figueiredo (IST & IT)
1
2 2
e x
0
(x)2
2 2
x 0
x <0
July 22, 2014
13 / 35
M
ario A. T. Figueiredo (IST & IT)
xi fX (xi )
X {x1 , ...xK } R
x fX (x) dx
X continuous
14 / 35
X
Z
xi fX (xi )
X {x1 , ...xK } R
x fX (x) dx
X continuous
M
ario A. T. Figueiredo (IST & IT)
14 / 35
X
Z
xi fX (xi )
X {x1 , ...xK } R
x fX (x) dx
X continuous
n
x
E(X ) = n p.
M
ario A. T. Figueiredo (IST & IT)
14 / 35
X
Z
xi fX (xi )
X {x1 , ...xK } R
x fX (x) dx
X continuous
n
x
E(X ) = n p.
Example: Gaussian, fX (x) = N (x; , 2 ).
M
ario A. T. Figueiredo (IST & IT)
E(X ) = .
14 / 35
X
Z
xi fX (xi )
X {x1 , ...xK } R
x fX (x) dx
X continuous
n
x
E(X ) = n p.
Example: Gaussian, fX (x) = N (x; , 2 ).
E(X ) = .
Linearity of expectation:
E(X + Y ) = E(X ) + E(Y ); E( X ) = E(X ), R
M
ario A. T. Figueiredo (IST & IT)
14 / 35
X
Z
X discrete, g (xi ) R
g (x) fX (x) dx
X continuous
M
ario A. T. Figueiredo (IST & IT)
15 / 35
X
Z
X discrete, g (xi ) R
g (x) fX (x) dx
X continuous
2
M
ario A. T. Figueiredo (IST & IT)
15 / 35
X
Z
X discrete, g (xi ) R
g (x) fX (x) dx
X continuous
2
= E(X 2 ) E(X )2
M
ario A. T. Figueiredo (IST & IT)
15 / 35
X
Z
X discrete, g (xi ) R
g (x) fX (x) dx
X continuous
2
= E(X 2 ) E(X )2
M
ario A. T. Figueiredo (IST & IT)
15 / 35
X
Z
X discrete, g (xi ) R
g (x) fX (x) dx
X continuous
2
= E(X 2 ) E(X )2
M
ario A. T. Figueiredo (IST & IT)
15 / 35
X
Z
X discrete, g (xi ) R
g (x) fX (x) dx
X continuous
2
= E(X 2 ) E(X )2
1
0
x A
x
6 A
Z
fX (x) dx =
A
M
ario A. T. Figueiredo (IST & IT)
15 / 35
fX ,Y (x, y ) = P(X = x Y = y ).
M
ario A. T. Figueiredo (IST & IT)
16 / 35
fX ,Y (x, y ) = P(X = x Y = y ).
M
ario A. T. Figueiredo (IST & IT)
16 / 35
fX ,Y (x, y ) = P(X = x Y = y ).
fX ,Y (x, y ),
if X is discrete
x
Z
Marginalization: fY (y ) =
M
ario A. T. Figueiredo (IST & IT)
16 / 35
fX ,Y (x, y ) = P(X = x Y = y ).
fX ,Y (x, y ),
if X is discrete
x
Z
Marginalization: fY (y ) =
Independence:
X
Y fX ,Y (x, y ) = fX (x) fY (y )
M
ario A. T. Figueiredo (IST & IT)
.
July 22, 2014
16 / 35
fX ,Y (x, y ) = P(X = x Y = y ).
fX ,Y (x, y ),
if X is discrete
x
Z
Marginalization: fY (y ) =
Independence:
X
Y fX ,Y (x, y ) = fX (x) fY (y )
M
ario A. T. Figueiredo (IST & IT)
16 / 35
M
ario A. T. Figueiredo (IST & IT)
fX ,Y (x, y )
P(X = x Y = y )
=
.
P(Y = y )
fY (y )
17 / 35
fX ,Y (x, y )
P(X = x Y = y )
=
.
P(Y = y )
fY (y )
fX ,Y (x, y )
fY (y )
M
ario A. T. Figueiredo (IST & IT)
17 / 35
fX ,Y (x, y )
P(X = x Y = y )
=
.
P(Y = y )
fY (y )
fX ,Y (x, y )
fY (y )
M
ario A. T. Figueiredo (IST & IT)
fY |X (y |x) fX (x)
fY (y )
(pdf or pmf).
17 / 35
fX ,Y (x, y )
P(X = x Y = y )
=
.
P(Y = y )
fY (y )
fX ,Y (x, y )
fY (y )
fY |X (y |x) fX (x)
fY (y )
(pdf or pmf).
M
ario A. T. Figueiredo (IST & IT)
17 / 35
M
ario A. T. Figueiredo (IST & IT)
18 / 35
Marginals: fX (0) =
1
5
2
5
fY (0) =
1
5
1
10
M
ario A. T. Figueiredo (IST & IT)
= 35 ,
=
3
10 ,
fX (1) =
1
10
fY (1) =
2
5
3
10
3
10
=
=
4
10 ,
7
10 .
18 / 35
Marginals: fX (0) =
1
5
2
5
= 35 ,
fY (0) =
1
5
1
10
3
10 ,
fX (1) =
1
10
fY (1) =
2
5
3
10
3
10
=
=
4
10 ,
7
10 .
Conditional probabilities:
M
ario A. T. Figueiredo (IST & IT)
18 / 35
x =n
1
2
k
x
x
x
1 2
K
Pi i
fX (x1 , ..., xK ) =
0
i xi 6= n
n
x1 x2 xK
n!
x1 ! x2 ! xK !
P
Parameters: p1 , ..., pK 0, such that i pi = 1.
M
ario A. T. Figueiredo (IST & IT)
19 / 35
x =n
1
2
k
x
x
x
1 2
K
Pi i
fX (x1 , ..., xK ) =
0
i xi 6= n
n
x1 x2 xK
n!
x1 ! x2 ! xK !
P
Parameters: p1 , ..., pK 0, such that i pi = 1.
=
M
ario A. T. Figueiredo (IST & IT)
19 / 35
x =n
1
2
k
x
x
x
1 2
K
Pi i
fX (x1 , ..., xK ) =
0
i xi 6= n
n
x1 x2 xK
n!
x1 ! x2 ! xK !
P
Parameters: p1 , ..., pK 0, such that i pi = 1.
=
M
ario A. T. Figueiredo (IST & IT)
19 / 35
1
exp (x )T C 1 (x )
fX (x) = N (x; , C ) = p
2
det(2 C )
1
M
ario A. T. Figueiredo (IST & IT)
20 / 35
1
exp (x )T C 1 (x )
fX (x) = N (x; , C ) = p
2
det(2 C )
1
M
ario A. T. Figueiredo (IST & IT)
20 / 35
1
exp (x )T C 1 (x )
fX (x) = N (x; , C ) = p
2
det(2 C )
1
M
ario A. T. Figueiredo (IST & IT)
20 / 35
M
ario A. T. Figueiredo (IST & IT)
21 / 35
M
ario A. T. Figueiredo (IST & IT)
21 / 35
M
ario A. T. Figueiredo (IST & IT)
21 / 35
M
ario A. T. Figueiredo (IST & IT)
21 / 35
X
Y fX ,Y (x, y ) = fX (x) fY (y )
cov(X , Y ) = 0.
6
M
ario A. T. Figueiredo (IST & IT)
21 / 35
X
Y fX ,Y (x, y ) = fX (x) fY (y )
cov(X , Y ) = 0.
6
Covariance matrix of multivariate RV, X Rn :
h
T i
cov(X ) = E X E(X ) X E(X )
= E(X X T ) E(X )E(X )T
M
ario A. T. Figueiredo (IST & IT)
21 / 35
X
Y fX ,Y (x, y ) = fX (x) fY (y )
cov(X , Y ) = 0.
6
Covariance matrix of multivariate RV, X Rn :
h
T i
cov(X ) = E X E(X ) X E(X )
= E(X X T ) E(X )E(X )T
Covariance of Gaussian RV, fX (x) = N (x; , C ) cov(X ) = C
M
ario A. T. Figueiredo (IST & IT)
21 / 35
Statistical Inference
Scenario: observed RV Y , depends on unknown variable(s) X .
Goal: given an observation Y = y , infer X .
M
ario A. T. Figueiredo (IST & IT)
22 / 35
Statistical Inference
Scenario: observed RV Y , depends on unknown variable(s) X .
Goal: given an observation Y = y , infer X .
Two main philosophies:
Frequentist: X = x is fixed (not an RV), but unknown;
Bayesian: X is a random variable with pdf/pmf fX (x) (the prior);
this prior expresses/formalizes knowledge about X .
M
ario A. T. Figueiredo (IST & IT)
22 / 35
Statistical Inference
Scenario: observed RV Y , depends on unknown variable(s) X .
Goal: given an observation Y = y , infer X .
Two main philosophies:
Frequentist: X = x is fixed (not an RV), but unknown;
Bayesian: X is a random variable with pdf/pmf fX (x) (the prior);
this prior expresses/formalizes knowledge about X .
In both philosophies, a central object is fY |X (y |x)
several names: likelihood function, observation model,...
M
ario A. T. Figueiredo (IST & IT)
22 / 35
Statistical Inference
Scenario: observed RV Y , depends on unknown variable(s) X .
Goal: given an observation Y = y , infer X .
Two main philosophies:
Frequentist: X = x is fixed (not an RV), but unknown;
Bayesian: X is a random variable with pdf/pmf fX (x) (the prior);
this prior expresses/formalizes knowledge about X .
In both philosophies, a central object is fY |X (y |x)
several names: likelihood function, observation model,...
This in not statistical/machine learning! fY |X (y |x) is assumed known.
M
ario A. T. Figueiredo (IST & IT)
22 / 35
Statistical Inference
Scenario: observed RV Y , depends on unknown variable(s) X .
Goal: given an observation Y = y , infer X .
Two main philosophies:
Frequentist: X = x is fixed (not an RV), but unknown;
Bayesian: X is a random variable with pdf/pmf fX (x) (the prior);
this prior expresses/formalizes knowledge about X .
In both philosophies, a central object is fY |X (y |x)
several names: likelihood function, observation model,...
This in not statistical/machine learning! fY |X (y |x) is assumed known.
In the Bayesian philosophy, all the knowledge about X is carried by
fY |X (y |x) fX (x)
fY ,X (y , x)
fX |Y (x|y ) =
=
fY (y )
fY (y )
...the posterior (or a posteriori) pdf/pmf.
M
ario A. T. Figueiredo (IST & IT)
22 / 35
Statistical Inference
The posterior pdf/pmf fX |Y (x|y ) has all the information/knowledge
about X , given Y = y (conditionality principle).
M
ario A. T. Figueiredo (IST & IT)
23 / 35
Statistical Inference
The posterior pdf/pmf fX |Y (x|y ) has all the information/knowledge
about X , given Y = y (conditionality principle).
How to make an optimal guess b
x about X , given this information?
M
ario A. T. Figueiredo (IST & IT)
23 / 35
Statistical Inference
The posterior pdf/pmf fX |Y (x|y ) has all the information/knowledge
about X , given Y = y (conditionality principle).
How to make an optimal guess b
x about X , given this information?
Need to define optimal: loss function: L(b
x , x) R+ measures
loss/cost incurred by guessing b
x if truth is x.
M
ario A. T. Figueiredo (IST & IT)
23 / 35
Statistical Inference
The posterior pdf/pmf fX |Y (x|y ) has all the information/knowledge
about X , given Y = y (conditionality principle).
How to make an optimal guess b
x about X , given this information?
Need to define optimal: loss function: L(b
x , x) R+ measures
loss/cost incurred by guessing b
x if truth is x.
The optimal Bayesian decision minimizes the expected loss:
b
xBayes = arg min E[L(b
x , X )|Y = y ]
b
x
where
Z
L(b
x , x) fX |Y (x|y ) dx, continuous (estimation)
X
E[L(b
x , X )|Y = y ] =
L(b
x , x) fX |Y (x|y ),
discrete (classification)
M
ario A. T. Figueiredo (IST & IT)
23 / 35
M
ario A. T. Figueiredo (IST & IT)
24 / 35
M
ario A. T. Figueiredo (IST & IT)
24 / 35
K
X
L(b
x , x) fX |Y (x|y )
x=1
= arg min 1 fX |Y (b
x |y )
b
x
= arg max fX |Y (b
x |y ) b
xMAP
b
x
M
ario A. T. Figueiredo (IST & IT)
24 / 35
K
X
L(b
x , x) fX |Y (x|y )
x=1
= arg min 1 fX |Y (b
x |y )
b
x
= arg max fX |Y (b
x |y ) b
xMAP
b
x
24 / 35
M
ario A. T. Figueiredo (IST & IT)
25 / 35
fY |X (y |x) fX (x)
= arg max fY |X (y |x) fX (x)
x
fY (y )
M
ario A. T. Figueiredo (IST & IT)
25 / 35
fY |X (y |x) fX (x)
= arg max fY |X (y |x) fX (x)
x
fY (y )
M
ario A. T. Figueiredo (IST & IT)
25 / 35
fY |X (y |x) fX (x)
= arg max fY |X (y |x) fX (x)
x
fY (y )
25 / 35
M
ario A. T. Figueiredo (IST & IT)
26 / 35
Log-likelihood function:
log fY |X y1 , ..., yn |x = n log(1 x) + log
n
x X
yi
1x
i=1
M
ario A. T. Figueiredo (IST & IT)
26 / 35
Log-likelihood function:
log fY |X y1 , ..., yn |x = n log(1 x) + log
n
x X
yi
1x
i=1
Maximum likelihood: b
xML = arg maxx fY |X (y |x) =
M
ario A. T. Figueiredo (IST & IT)
1
n
n
X
yi
i=1
26 / 35
Log-likelihood function:
log fY |X y1 , ..., yn |x = n log(1 x) + log
n
x X
yi
1x
i=1
Maximum likelihood: b
xML = arg maxx fY |X (y |x) =
1
n
n
X
yi
i=1
26 / 35
M
ario A. T. Figueiredo (IST & IT)
27 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
M
ario A. T. Figueiredo (IST & IT)
27 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
M
ario A. T. Figueiredo (IST & IT)
27 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
M
ario A. T. Figueiredo (IST & IT)
27 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
Posterior:
P
P
fX |Y (x|y ) = x 1+ i yi (1 x)1+n i yi
M
ario A. T. Figueiredo (IST & IT)
27 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
Posterior:
P
P
fX |Y (x|y ) = x 1+ i yi (1 x)1+n i yi
MAP: b
xMAP =
P
+ i yi 1
++n2
M
ario A. T. Figueiredo (IST & IT)
27 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
Posterior:
P
P
fX |Y (x|y ) = x 1+ i yi (1 x)1+n i yi
MAP: b
xMAP =
Example: = 4, = 4, n = 10,
y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
P
+ i yi 1
++n2
b
xMAP = 0.625
M
ario A. T. Figueiredo (IST & IT)
recall b
xML = 0.7
LxMLS 2014: Probability Theory
27 / 35
M
ario A. T. Figueiredo (IST & IT)
28 / 35
M
ario A. T. Figueiredo (IST & IT)
28 / 35
= arg min b
x2 2 b
x E[X |Y = y ]
b
x
= E[X |Y = y ] b
xMMSE
MMSE = minimum mean squared error criterion.
M
ario A. T. Figueiredo (IST & IT)
28 / 35
= arg min b
x2 2 b
x E[X |Y = y ]
b
x
= E[X |Y = y ] b
xMMSE
MMSE = minimum mean squared error criterion.
Does not apply to classification problems.
M
ario A. T. Figueiredo (IST & IT)
28 / 35
M
ario A. T. Figueiredo (IST & IT)
29 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
M
ario A. T. Figueiredo (IST & IT)
29 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
I
M
ario A. T. Figueiredo (IST & IT)
29 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
I
Posterior:
P
P
fX |Y (x|y ) = x 1+ i yi (1 x)1+n i yi
M
ario A. T. Figueiredo (IST & IT)
29 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
I
Posterior:
P
P
fX |Y (x|y ) = x 1+ i yi (1 x)1+n i yi
MMSE: b
xMMSE =
M
ario A. T. Figueiredo (IST & IT)
P
+ i yi
++n
29 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
I
Posterior:
P
P
fX |Y (x|y ) = x 1+ i yi (1 x)1+n i yi
MMSE: b
xMMSE =
Example: = 4, = 4, n = 10,
y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
b
xMMSE ' 0.611
P
+ i yi
++n
recall that b
xMAP = 0.625, b
xML = 0.7
M
ario A. T. Figueiredo (IST & IT)
29 / 35
Likelihood: fY |X y1 , ..., yn |x =
n
Y
x yi (1 x)1yi = x
yi
(1 x)n
yi
i=1
I
Posterior:
P
P
fX |Y (x|y ) = x 1+ i yi (1 x)1+n i yi
MMSE: b
xMMSE =
Example: = 4, = 4, n = 10,
y = (1, 1, 1, 0, 1, 0, 0, 1, 1, 1),
b
xMMSE ' 0.611
P
+ i yi
++n
recall that b
xMAP = 0.625, b
xML = 0.7
29 / 35
M
ario A. T. Figueiredo (IST & IT)
30 / 35
73
74
' 0.689, b
xMMSE =
' 0.685
106
108
M
ario A. T. Figueiredo (IST & IT)
30 / 35
73
74
' 0.689, b
xMMSE =
' 0.685
106
108
30 / 35
Important Inequalities
Markovs ineqality: if X 0 is an RV with expectation E(X ), then
P(X > t)
M
ario A. T. Figueiredo (IST & IT)
E(X )
t
31 / 35
Important Inequalities
Markovs ineqality: if X 0 is an RV with expectation E(X ), then
P(X > t)
E(X )
t
Simple proof:
Z
t fX (x) dx
t P(X > t) =
t
Z t
x fX (x) dx = E(X ) x fX (x) dx E(X )
| 0 {z
}
0
M
ario A. T. Figueiredo (IST & IT)
31 / 35
Important Inequalities
Markovs ineqality: if X 0 is an RV with expectation E(X ), then
P(X > t)
E(X )
t
Simple proof:
Z
t fX (x) dx
t P(X > t) =
t
Z t
x fX (x) dx = E(X ) x fX (x) dx E(X )
| 0 {z
}
0
2
s2
31 / 35
M
ario A. T. Figueiredo (IST & IT)
32 / 35
M
ario A. T. Figueiredo (IST & IT)
32 / 35
M
ario A. T. Figueiredo (IST & IT)
32 / 35
32 / 35
K
X
x=1
M
ario A. T. Figueiredo (IST & IT)
33 / 35
K
X
x=1
Positivity: H(X ) 0 ;
H(X ) = 0 fX (i) = 1, for exactly one i {1, ..., K }.
M
ario A. T. Figueiredo (IST & IT)
33 / 35
K
X
x=1
Positivity: H(X ) 0 ;
H(X ) = 0 fX (i) = 1, for exactly one i {1, ..., K }.
Upper bound: H(X ) log K ;
H(X ) = log K fX (x) = 1/k, for all x {1, ..., K }
M
ario A. T. Figueiredo (IST & IT)
33 / 35
K
X
x=1
Positivity: H(X ) 0 ;
H(X ) = 0 fX (i) = 1, for exactly one i {1, ..., K }.
Upper bound: H(X ) log K ;
H(X ) = log K fX (x) = 1/k, for all x {1, ..., K }
Measure of uncertainty/randomness of X
M
ario A. T. Figueiredo (IST & IT)
33 / 35
K
X
x=1
Positivity: H(X ) 0 ;
H(X ) = 0 fX (i) = 1, for exactly one i {1, ..., K }.
Upper bound: H(X ) log K ;
H(X ) = log K fX (x) = 1/k, for all x {1, ..., K }
Measure of uncertainty/randomness of X
Z
Continuous RV X , differential entropy: h(X ) =
M
ario A. T. Figueiredo (IST & IT)
33 / 35
K
X
x=1
Positivity: H(X ) 0 ;
H(X ) = 0 fX (i) = 1, for exactly one i {1, ..., K }.
Upper bound: H(X ) log K ;
H(X ) = log K fX (x) = 1/k, for all x {1, ..., K }
Measure of uncertainty/randomness of X
Z
Continuous RV X , differential entropy: h(X ) =
M
ario A. T. Figueiredo (IST & IT)
33 / 35
K
X
x=1
Positivity: H(X ) 0 ;
H(X ) = 0 fX (i) = 1, for exactly one i {1, ..., K }.
Upper bound: H(X ) log K ;
H(X ) = log K fX (x) = 1/k, for all x {1, ..., K }
Measure of uncertainty/randomness of X
Z
Continuous RV X , differential entropy: h(X ) =
1
2
log(2e 2 ).
33 / 35
K
X
x=1
Positivity: H(X ) 0 ;
H(X ) = 0 fX (i) = 1, for exactly one i {1, ..., K }.
Upper bound: H(X ) log K ;
H(X ) = log K fX (x) = 1/k, for all x {1, ..., K }
Measure of uncertainty/randomness of X
Z
Continuous RV X , differential entropy: h(X ) =
1
2
1
2
log(2e 2 ).
log(2e 2 )
33 / 35
Kullback-Leibler divergence
Kullback-Leibler divergence (KLD) between two pmf:
D(fX kgX ) =
K
X
fX (x) log
x=1
M
ario A. T. Figueiredo (IST & IT)
fX (x)
gX (x)
34 / 35
Kullback-Leibler divergence
Kullback-Leibler divergence (KLD) between two pmf:
D(fX kgX ) =
K
X
fX (x) log
x=1
fX (x)
gX (x)
M
ario A. T. Figueiredo (IST & IT)
34 / 35
Kullback-Leibler divergence
Kullback-Leibler divergence (KLD) between two pmf:
D(fX kgX ) =
K
X
fX (x) log
x=1
fX (x)
gX (x)
M
ario A. T. Figueiredo (IST & IT)
fX (x) log
fX (x)
dx
gX (x)
34 / 35
Kullback-Leibler divergence
Kullback-Leibler divergence (KLD) between two pmf:
D(fX kgX ) =
K
X
fX (x) log
x=1
fX (x)
gX (x)
fX (x) log
fX (x)
dx
gX (x)
34 / 35
M
ario A. T. Figueiredo (IST & IT)
35 / 35