Professional Documents
Culture Documents
Lecture 10
Hidden Markov Models (HMMs)
Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU)
.
Outline
Markov Models
Outline
Markov Chains (Markov Models)
Hidden Markov Chains (HMMs)
Algorithmic Questions
Biological Relevance
Markov Property
Markov Property: The state of the system at time t+1
depends only on the state of the system at time t
P[X t 1 x t 1 | X t x t , X t -1 x t -1 , . . . , X1 x1 , X 0 x 0 ]
P[X t 1 x t 1 | X t x t ]
Xt=1
Xt=2
Xt=3
Xt=4
Xt=5
Markov Chains
Stationarity Assumption
Probabilities independent of t when process is stationary
So,
rain tomorrow
prr = 0.4
raining today
no rain tomorrow
prn = 0.6
no raining today
rain tomorrow
pnr = 0.2
no raining today
no rain tomorrow
prr = 0.8
0.4 0.6
P
0 .2 0 .8
Note that rows sum to 1
Such a matrix is called a Stochastic Matrix
If the rows of a matrix and the columns of a matrix all
sum to 1, we have a Doubly Stochastic Matrix
0.9
coke
0.8
pepsi
0.2
0.9 0.1
P
0.2 0.8
P00
0.9 0.1
P
0
.
2
0
.
8
( 3)
P ( X 3 0) Qi pi(03) Q0 p00
Q1 p10(3) 0.6 0.781 0.4 0.438 0.6438
i 0
0.9
coke
0.8
pepsi
0.2
pepsi
0.2
http://en.wikipedia.org/wiki/Markov_chain
http://en.wikipedia.org/wiki/Markov_chain
Matrix A
Problem given that the weather on day 1 (t=1) is sunny(3), what is the
probability for the observation O:
The answer is -
Types of Models
Ergodic model
Strongly connected - directed
path w/ positive probabilities
from each state i to state j
(but not necessarily complete
directed graph)
1
1-p
N-1
2
1-p
1-p
Start
(10$)
1-p
1
1-p
N-1
2
1-p
1-p
Start
(10$)
1-p
Our
1/2
Fair
0.9
1/2
1/2
1/2
head
0.1
0.1
tail
1/4
loaded
3/4
0.9
head
a34
a23
b14
b13
b12
a44
a33
a22
4
2
Observed
phenomenon
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL
Observed data
2
Coin-Tossing Example
Start
1/2
Fair
0.9
1/2
1/2
tail
tail
1/4
0.1
loaded
0.1
3/4
1/2
0.9
head
head
Fair/Loade
d
L tosses
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL
Head/Tail
1/2
tail
1/2
Fair
0.9
1/2
0.1
0.1
tail
1/4
loaded
0.9
3/4
head
head
Fair/Loade
d
L tosses
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL
Head/Tail
HMMs Question I
OL),
q/4
P
Regular
change
P
DNA
P q
q/4
q/4
p/6
(1-P)/4
p/3
(1-q)/6
(1-q)/3
p/3
P/6
C-G island
CpG Islands
We construct Markov chain
for CpG rich and poor
regions
Using maximum likelihood
estimates from 60K
nucleotide, we get two
models
Empirical Evalation
Alternative Approach
change
C-G
island?
H1
H2
Hi
HL-1
HL
X1
X2
Xi
XL-1
XL
A/C/G/T
= Q a Q Q a Q Q a Q Q a Q
P(O|Q) = bQ O bQ O bQ O bQ O
P(Q|M)
T-1
QT
Learning
Given a sequence x1,,xn, h1,,hn
We simply count:
Nkl - number of times hi=k & hi+1=l
Nka
Bka
Nka '
a'
Learning
Given only sequence x1,,xn
Problem:
Counts are inaccessible since we do not observe hi
P (Hi k , Hi 1 l | x1 , , xn )
P (Hi k , Hi 1 l , x1 , , xn )
P (x 1 , , x n )
P (Hi k , x1 , , xi )P (xi 1 | Hi 1 l )P (xi 2 , , xn | Hi 1 l )
P (x1 , , x n )
fk (i )Blxi 1 bl (i 1 )
P (x 1 , , x n )
Expected Counts
E [Nkl ] P (Hi k , Hi 1 l | x1 , , xn )
i
Similarly
E [Nka ]
P (Hi
i x a
,
k | x1 , , xn )
E-step:
Compute expected counts E[Nkl], E[Nka]
M-Step:
Restimate:
E [Nkl ]
A'kl
E [Nkl ']
l'
E [Nka ]
B'ka
E [Nka ']
a'
Reiterate
5
EM - basic properties
Complexity of E-step
Compute forward and backward messages
Time & Space complexity: O(nL)
Accumulate expected counts
Time complexity O(nL2)
Space complexity O(L2)
EM - problems
Local Maxima:
Learning can get stuck in local maxima
Sensitive to initialization
Require some method for escaping such maxima
Choosing L
We often do not know how many hidden values we
should have or can learn
Communication Example