CS621: Artificial Intelligence
Pushpak Bhattacharyya
CSE Dept., CSE Dept., IIT Bombay
Lecture 35–HMM; Forward and Backward Probabilities
19 th Oct, 2010
HMM Definition
Set of states: S where |S|=N
Start state S
0/*P(S
0)=1*/
Output Alphabet: O where |O|=M
Transition Probabilities: A= {a
ij} /*state i to state j*/
ij
state j*/
Emission Probabilities : B= {b
j(o
k)} /*prob. of emitting or absorbing o
kfrom state j*/
Initial State Probabilities: Π={p
1,p
2,p
3,…p
N}
Each p
i=P(o
0=ε,S
i|S
0)
Three basic problems (contd.)
Problem 1: Likelihood of a sequence
Forward Procedure
Backward Procedure
Backward Procedure
Problem 2: Best state sequence
Viterbi Algorithm
Problem 3: Re-estimation
Baum-Welch ( Forward-Backward
Algorithm )
Forward and Backward
Probability Calculation
Forward probability F(k,i)
Define F(k,i)= Probability of being in state S i having seen o 0 o 1 o 2 …o k
F(k,i)=P(o 0 o 1 o 2 …o k , S i )
With m as the length of the observed
With m as the length of the observed sequence
P(observed sequence)=P(o 0 o 1 o 2 ..o m )
=Σ p=0,N P(o 0 o 1 o 2 ..o m , S p )
=Σ p=0,N F(m , p)
Forward probability (contd.)
F(k , q)
= P(o
0o
1o
2..o
k, S
q)
= P(o
0o
1o
2..o
k, S
q)
= P(o
0o
1o
2..o
k-1, o
k,S
q)
= Σ
p=0,NP(o
0o
1o
2..o
k-1, S
p, o
k,S
q)
= Σ
p=0,NP(o
0o
1o
2..o
k-1, S
p).
= Σ
p=0,NP(o
0o
1o
2..o
k-1, S
p).
P(o
m,S
q|o
0o
1o
2..o
k-1, S
p)
= Σ
p=0,NF(k-1,p). P(o
k,S
q|S
p)
= Σ
p=0,NF(k-1,p). P(S
pS
q)
ok
O
0O
1O
2O
3… O
kO
k+1… O
m-1O
mS
0S
1S
2S
3… S
pS
q… S
mS
finalBackward probability B(k,i)
Define B(k,i)= Probability of seeing
o k o k+1 o k+2 …o m given that the state was S i
B(k,i)=P(o k k k+1 k+2 o k+1 o k+2 …o m m \ S i i )
With m as the length of the observed sequence
P(observed sequence)=P(o 0 o 1 o 2 ..o m )
= P(o 0 o 1 o 2 ..o m | S 0 )
=B(0,0)
Backward probability (contd.)
B(k , p)
= P(o
ko
k+1o
k+2…o
m\ S
p)
= P(o
k+1o
k+2…o
m, o
k|S
p)
= Σ
q=0,NP(o
k+1o
k+2…o
m, o
k, S
q|S
p)
= Σ
q=0,NP(o
k,S
q|S
p)
P(o
k+1o
k+2…o
m|o
k,S
q,S
p)
= Σ
q=0,NP(o
k+1o
k+2…o
m|S
q). P(o
k, S
q|S
p)
= Σ
q=0,NB(k+1,q). P(S
pS
q)
ok
O
0O
1O
2O
3… O
kO
k+1… O
m-1O
mS
0S
1S
2S
3… S
pS
q… S
mS
finalContinuing with the Urn example
Urn 1 Urn 2 Urn 3
Colored Ball choosing
Urn 1
# of Red = 30
# of Green = 50
# of Blue = 20
Urn 3
# of Red =60
# of Green =10
# of Blue = 30 Urn 2
# of Red = 10
# of Green = 40
# of Blue = 50
Example (contd.)
U
1U
2U
3U
10.1 0.4 0.5 U
20.6 0.2 0.2 U
30.3 0.4 0.3
Given :
Observation : RRGGBRGR
and
R G B
U
10.3 0.5 0.2 U
20.1 0.4 0.5 U
30.6 0.1 0.3
Transition Probability Observation/output Probability
Observation : RRGGBRGR
What is the corresponding state sequence ?
Diagrammatic representation (1/2)
U U
0.1
0.3 0.3
R, 0.6 B, 0.2
R, 0.3 G, 0.5
U1
U2
U3
0.1
0.2
0.4 0.6
0.4
0.5
0.2
R, 0.6 G, 0.1 B, 0.3
R, 0.1
B, 0.5 G, 0.4
Diagrammatic representation (2/2)
U R,0.15 U
R,0.18 G,0.03 B,0.09
R,0.18 R,0.03
G,0.05 B,0.02
U1
U2
U3
R,0.02 G,0.08 B,0.10
R,0.24 G,0.04 B,0.12 R,0.06
G,0.24 B,0.30 R, 0.08
G, 0.20 B, 0.12
R,0.15 G,0.25 B,0.10
R,0.18 G,0.03 B,0.09
R,0.02 G,0.08 B,0.10
Observations and states
O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 OBS: R R G G B R G R State: S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8
S
i= U
1/U
2/U
3; A particular state S: State sequence
O: Observation sequence
S* = “best” possible state (urn) sequence
Goal: Maximize P(S*|O) by choosing “best” S
Grouping terms
P(S).P(O|S)
= [P(O
0|S
0).P(S
1|S
0)].
[P(O
1|S
1). P(S
2|S
1)].
We introduce the states S
0and S
9as initial and final states
O
0O
1O
2O
3O
4O
5O
6O
7O
8Obs:
ε R R G G B R G R
State:
S
0S
1S
2S
3S
4S
5S
6S
7S
8S
9[P(O
1|S
1). P(S
2|S
1)].
[P(O
2|S
2). P(S
3|S
2)].
[P(O
3|S
3).P(S
4|S
3)].
[P(O
4|S
4).P(S
5|S
4)].
[P(O
5|S
5).P(S
6|S
5)].
[P(O
6|S
6).P(S
7|S
6)].
[P(O
7|S
7).P(S
8|S
7)].
[P(O
8|S
8).P(S
9|S
8)].
and final states respectively.
After S
8the next state
is S
9with probability
1, i.e., P(S
9|S
8)=1
O
0is ε-transition
Introducing useful notation
S0 S1 S7
S2 S3 S4 S5 S6
O
0O
1O
2O
3O
4O
5O
6O
7O
8Obs:
ε R R G G B R G R
State:
S
0S
1S
2S
3S
4S
5S
6S
7S
8S
9ε R R G G B R
S0 S1
S8
S9 S2
G
R
P(O
k|S
k).P(S
k+1|S
k)=P(S
kS
k+1)
Ok
Viterbi Algorithm for the Urn problem (first two symbols)
S0
U U U
0.5
0.3
0.2 ε
U1 U2 U3
U1 U2 U3
0.03
0.08
0.15
U1 U2 U3 U1 U2 U3
0.06
0.02
0.02
0.18
0.24
0.18
0.015 0.04 0.075* 0.018 0.006 0.006 0.048* 0.036
*: winner sequences R
Probabilistic FSM
(a1:0.3)
(a2:0.4)
(a1:0.1) (a1:0.3)
S
1S
2 (a1:0.2)
(a2:0.3)
(a2:0.2) (a2:0.2)
The question here is:
“what is the most likely state sequence given the output sequence seen”
S
1S
2
Developing the tree
Start
S1 S2
S1 S2 S1 S2
1.0 0.0
0.1 0.3 0.2 0.3
1*0.1=0.1 . 0.3 . 0.0 0.0
€
a
1S1 S2 S1 S2
S1 S2 S1 S2
1*0.1=0.1 0.3 0.0 0.0
0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06
. .
a
2Choose the winning sequence per state per iteration
0.2 0.4 0.3 0.2
Tree structure contd…
S1 S2
S1 S2 S1 S2
0.1 0.3 0.2 0.3
0.027 . 0.012 .
0.09 0.06
0.09*0.1=0.009 0.018
a
1S1
0.3
0.0081
S2 0.2
0.0054
S2 0.4
0.0048 S1
0.2
0.0024
.
a
2The problem being addressed by this tree is S * arg max P ( S | a
1a
2a
1a
2,µ)
s
−
−
−
=
a1-a2-a1-a2 is the output sequence and µ the model or the machine
Path found :
(working backward)
S
1S
2
S
1
S
2
S
1
a
2a
1a
1a
2
Problem statement : Find the best possible sequence
) ,
| max (
* arg P S O µ S
s
=
Machine or
Model Seq,
Output Seq,
State
,
S → O →µ
→where
,
S →State Seq,
O →Output Seq, µ
→Model or Machine
where} , , , { Machine
or
Model = S
0S A T
Start symbol State collection Alphabet set
Transitions
T is defined as P ( S
i → a
kS
j) ∀
i, j, kTabular representation of the tree
€ a
1
a
2
a
1
a
2
S
1.0 (1.0*0.1,0.0*0.2 (0.02, (0.009, 0.012) (0.0024, Ending stateLatest symbol observed
S
1 1.0 (1.0*0.1,0.0*0.2 )=(0.1,0.0)(0.02, 0.09)
(0.009, 0.012) (0.0024, 0.0081)
S
2 0.0 (1.0*0.3,0.0*0.3 )=(0.3,0.0)(0.04,0.0 6)
(0.027,0.018) (0.0048,0.005 4)
Note: Every cell records the winning probability ending in that state Final winner The bold faced values in each cell shows the
sequence probability ending in that state. Going backward from final winner sequence which ends in state S2 (indicated By the 2nd tuple), we recover the sequence.