Professional Documents
Culture Documents
Entropy
Frank Keller
Entropy
Entropy and Information
Joint Entropy
Conditional Entropy
School of Informatics
University of Edinburgh
keller@inf.ed.ac.uk
March 6, 2006
Frank Keller
Entropy
Frank Keller
Entropy
Definition: Entropy
xX
H(X ) =
8
X
1
x=1
log
1
8
1
= log = log 8 = 3 bits
8
for x =
x=1
Frank Keller
8
X
1
8
Frank Keller
Entropy
x
f(x)
2
010
3
011
4
100
5
101
6
110
7
111
1
8
1
4
1
8
1
4
1
8
1
8
8
000
x{p,t,k,a,i,u}
= (4 log
Frank Keller
Entropy
Entropy
Theorem: Entropy
If X is a binary random variable with the distribution f (0) = p and
f (1) = 1 p, then:
Properties of Entropy
t
00
1
1
1
+ 2 log ) = 2 bits
8
4
2
Frank Keller
p
100
Entropy
a
01
i
110
H(X ) = 0 if p = 0 or p = 1
u
111
1
2
Frank Keller
Frank Keller
Entropy
Entropy
Properties of Entropy
Joint Entropy
H(X)
0.8
0.6
0.4
xX y Y
0.2
0
0
0.2
0.4
0.6
Frank Keller
Entropy
0.8
Frank Keller
Entropy
Conditional Entropy
10
Conditional Entropy
Example: simplified Polynesian
Now assume that you have the joint probability of a vowel and a
consonant occurring together in the same syllable:
f (x, y )
a
i
u
f (x)
xX y Y
f (y )
1
16
1
16
3
8
3
16
3
16
3
4
1
16
1
2
1
4
1
4
0
1
8
1
16
1
8
Frank Keller
11
f (a|p)
f (a, p)
=
f (p)
f (a|t)
f (a, t)
=
f (t)
Frank Keller
1
1
16
1 = 2
8
3
1
8
3 = 2
4
12
Entropy
Entropy
Conditional Entropy
Conditional Entropy
For probability distributions we defined:
f (y |x) =
f (x, y )
g (x)
xC y V
1
( 16
1
16
11
8
3
16
log
log
0+
1
16
1
8
3
16
1
16
1
8
log
3
16
3
4
3
8
log
log
1
16
3
16
3
4
3
8
3
4
1
16
log
1
16
1
8
+ 0+
log
1
16
1
8
= 1.375 bits
Entropy
13
Frank Keller
Entropy
Conditional Entropy
14
Summary
xC
15
Frank Keller
16