CSE Dept., CSE Dept., IIT Bombay

(1)

CS621: Artificial Intelligence

Pushpak Bhattacharyya

CSE Dept., CSE Dept., IIT Bombay

Lecture 35–HMM; Forward and Backward Probabilities

19 ^th Oct, 2010

(2)

HMM Definition

Set of states: S where |S|=N

Start state S

₀

/*P(S

₀

)=1*/

Output Alphabet: O where |O|=M

Transition Probabilities: A= {a

_ij

} /state i to state j/

ij

state j*/

Emission Probabilities : B= {b

_j

(o

_k

)} /*prob. of emitting or absorbing o

_k

from state j*/

Initial State Probabilities: Π={p

₁

,p

₂

,p

₃

,…p

_N

}

Each p

_i

=P(o

₀

=ε,S

_i

|S

₀

)

(3)

Three basic problems (contd.)

Problem 1: Likelihood of a sequence

Forward Procedure

Backward Procedure

Problem 2: Best state sequence

Viterbi Algorithm

Problem 3: Re-estimation

Baum-Welch ( Forward-Backward

Algorithm )

(4)

Forward and Backward

Probability Calculation

(5)

Forward probability F(k,i)

Define F(k,i)= Probability of being in state S _i having seen o ₀ o ₁ o ₂ …o _k

F(k,i)=P(o ₀ o ₁ o ₂ …o _k , S _i )

With m as the length of the observed

With m as the length of the observed sequence

P(observed sequence)=P(o ₀ o ₁ o ₂ ..o _m )

=Σ _p=0,N P(o ₀ o ₁ o ₂ ..o _m , S _p )

=Σ _p=0,N F(m , p)

(6)

Forward probability ^(contd.)

F(k , q)

= P(o

₀

o

₁

o

₂

..o

_k

, S

_q

)

= P(o

₀

o

₁

o

₂

..o

_k

, S

_q

)

= P(o

₀

o

₁

o

₂

..o

_k-1

, o

_k

,S

_q

)

= Σ

_p=0,N

P(o

₀

o

₁

o

₂

..o

_k-1

, S

_p

, o

_k

,S

_q

)

= Σ

_p=0,N

P(o

₀

o

₁

o

₂

..o

_k-1

, S

_p

).

= Σ

_p=0,N

P(o

₀

o

₁

o

₂

..o

_k-1

, S

_p

).

P(o

_m

,S

_q

|o

₀

o

₁

o

₂

..o

_k-1

, S

_p

)

= Σ

_p=0,N

F(k-1,p). P(o

_k

,S

_q

|S

_p

)

= Σ

_p=0,N

F(k-1,p). P(S

_p

S

_q

)

ok

O

₀

O

₁

O

₂

O

₃

… O

_k

O

_k+1

… O

_m-1

O

_m

S

₀

S

₁

S

₂

S

₃

… S

_p

S

_q

… S

_m

S

_final

(7)

Backward probability B(k,i)

Define B(k,i)= Probability of seeing

o _k o _k+1 o _k+2 …o _m given that the state was S _i

B(k,i)=P(o _k _{k k+1 k+2} o _k+1 o _k+2 …o _m _m \ S _i _i )

With m as the length of the observed sequence

P(observed sequence)=P(o ₀ o ₁ o ₂ ..o _m )

= P(o ₀ o ₁ o ₂ ..o _m | S ₀ )

=B(0,0)

(8)

Backward probability ^(contd.)

B(k , p)

= P(o

_k

o

_k+1

o

_k+2

…o

_m

\ S

_p

)

= P(o

_k+1

o

_k+2

…o

_m

, o

_k

|S

_p

)

= Σ

_q=0,N

P(o

_k+1

o

_k+2

…o

_m

, o

_k

, S

_q

|S

_p

)

= Σ

_q=0,N

P(o

_k

,S

_q

|S

_p

)

P(o

_k+1

o

_k+2

…o

_m

|o

_k

,S

_q

,S

_p

)

= Σ

_q=0,N

P(o

_k+1

o

_k+2

…o

_m

|S

_q

). P(o

_k

, S

_q

|S

_p

)

= Σ

_q=0,N

B(k+1,q). P(S

_p

S

_q

)

ok

O

₀

O

₁

O

₂

O

₃

… O

_k

O

_k+1

… O

_m-1

O

_m

S

₀

S

₁

S

₂

S

₃

… S

_p

S

_q

… S

_m

S

_final

(9)

Continuing with the Urn example

Urn 1 Urn 2 Urn 3

Colored Ball choosing

Urn 1

# of Red = 30

# of Green = 50

# of Blue = 20

Urn 3

# of Red =60

# of Green =10

# of Blue = 30 Urn 2

# of Red = 10

# of Green = 40

# of Blue = 50

(10)

Example (contd.)

U

₁

U

₂

U

₃

U

₁

0.1 0.4 0.5 U

₂

0.6 0.2 0.2 U

₃

0.3 0.4 0.3

Given :

Observation : RRGGBRGR

and

R G B

U

₁

0.3 0.5 0.2 U

₂

0.1 0.4 0.5 U

₃

0.6 0.1 0.3

Transition Probability Observation/output Probability

Observation : RRGGBRGR

What is the corresponding state sequence ?

(11)

Diagrammatic representation (1/2)

U U

0.1

0.3 0.3

R, 0.6 B, 0.2

R, 0.3 G, 0.5

U1

U2

U3

0.1

0.2

0.4 0.6

0.4

0.5

0.2

R, 0.6 G, 0.1 B, 0.3

R, 0.1

B, 0.5 G, 0.4

(12)

Diagrammatic representation (2/2)

U ^R,0.15 U

R,0.18 G,0.03 B,0.09

R,0.18 R,0.03

G,0.05 B,0.02

U1

U2

U3

R,0.02 G,0.08 B,0.10

R,0.24 G,0.04 B,0.12 R,0.06

G,0.24 B,0.30 R, 0.08

G, 0.20 B, 0.12

R,0.15 G,0.25 B,0.10

R,0.18 G,0.03 B,0.09

R,0.02 G,0.08 B,0.10

(13)

Observations and states

O ¹ O ² O ³ O ⁴ O ⁵ O ⁶ O ⁷ O ⁸ OBS: R R G G B R G R State: S ¹ S ² S ³ S ⁴ S ⁵ S ⁶ S ⁷ S ⁸

S

_i

= U

₁

/U

₂

/U

₃

; A particular state S: State sequence

O: Observation sequence

S* = “best” possible state (urn) sequence

Goal: Maximize P(S*|O) by choosing “best” S

(14)

Grouping terms

P(S).P(O|S)

= [P(O

₀

|S

₀

).P(S

₁

|S

₀

)].

[P(O

₁

|S

₁

). P(S

₂

|S

₁

)].

We introduce the states S

₀

and S

₉

as initial and final states

O

₀

O

₁

O

₂

O

₃

O

₄

O

₅

O

₆

O

₇

O

₈

Obs:

ε R R G G B R G R

State:

S

₀

S

₁

S

₂

S

₃

S

₄

S

₅

S

₆

S

₇

S

₈

S

₉

[P(O

₁

|S

₁

). P(S

₂

|S

₁

)].

[P(O

₂

|S

₂

). P(S

₃

|S

₂

)].

[P(O

₃

|S

₃

).P(S

₄

|S

₃

)].

[P(O

₄

|S

₄

).P(S

₅

|S

₄

)].

[P(O

₅

|S

₅

).P(S

₆

|S

₅

)].

[P(O

₆

|S

₆

).P(S

₇

|S

₆

)].

[P(O

₇

|S

₇

).P(S

₈

|S

₇

)].

[P(O

₈

|S

₈

).P(S

₉

|S

₈

)].

and final states respectively.

After S

₈

the next state

is S

₉

with probability

1, i.e., P(S

₉

|S

₈

)=1

O

₀

is ε-transition

(15)

Introducing useful notation

S₀ S₁ S₇

S₂ S₃ S₄ S₅ S₆

O

₀

O

₁

O

₂

O

₃

O

₄

O

₅

O

₆

O

₇

O

₈

Obs:

ε R R G G B R G R

State:

S

₀

S

₁

S

₂

S

₃

S

₄

S

₅

S

₆

S

₇

S

₈

S

₉

ε R R G G B R

S₀ S₁

S₈

S₉ S₂

G

R

P(O

_k

|S

_k

).P(S

_k+1

|S

_k

)=P(S

_k

S

_k+1

)

Ok

(16)

Viterbi Algorithm for the Urn problem (first two symbols)

S₀

U U U

0.5

0.3

0.2 ε

U₁ U₂ U₃

0.03

0.08

0.15

U₁ U₂ U₃ U₁ U₂ U₃

0.06

0.02

0.18

0.24

0.18

0.015 0.04 0.075* 0.018 0.006 0.006 0.048* 0.036

*: winner sequences R

(17)

Probabilistic FSM

(a₁:0.3)

(a₂:0.4)

(a₁:0.1) (a₁:0.3)

S

1

S

2 (a₁:0.2)

(a₂:0.3)

(a₂:0.2) (a₂:0.2)

The question here is:

“what is the most likely state sequence given the output sequence seen”

S

1

S

2

(18)

Developing the tree

Start

S1 S2

S1 S2 S1 S2

1.0 0.0

0.1 0.3 0.2 0.3

1*0.1=0.1 . 0.3 . 0.0 0.0

€

a

₁

S1 S2 S1 S2

1*0.1=0.1 0.3 0.0 0.0

0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06

. .

a

₂

Choose the winning sequence per state per iteration

0.2 0.4 0.3 0.2

(19)

Tree structure contd…

S1 S2

S1 S2 S1 S2

0.1 0.3 0.2 0.3

0.027 . 0.012 .

0.09 0.06

0.09*0.1=0.009 0.018

a

₁

S1

0.3

0.0081

S2 0.2

0.0054

S2 0.4

0.0048 S1

0.2

0.0024

.

a

₂

The problem being addressed by this tree is S * arg max P ( S | a

¹

a

²

a

¹

a

²^,^µ

)

s

−

=

a1-a2-a1-a2 is the output sequence and µ the model or the machine

(20)

Path found ^:

(working backward)

S

1

S

2

S

1

S

2

S

1

a

2

a

1

a

1

a

2

Problem statement : Find the best possible sequence

) ,

| max (

* arg ^P ^S ^O µ S

s

=

Machine or

Model Seq,

Output Seq,

State

,

^S → ^O →

µ

→

where

,

^S →

State Seq,

^O →

Output Seq, µ

→

Model or Machine

where

} , , , { Machine

or

Model = S

⁰

S A T

Start symbol State collection Alphabet set

Transitions

T is defined as P ( S

ⁱ

 → a

^k

S

^j

) ∀

ⁱ^, ^j^, ^k

(21)

Tabular representation of the tree

€ a

1

a

2

a

1

a

2

S

^1.0 (1.0*0.1,0.0*0.2 (0.02, (0.009, 0.012) (0.0024, Ending state

Latest symbol observed

S

1 1.0 (1.0*0.1,0.0*0.2 )=(0.1,0.0)

(0.02, 0.09)

(0.009, 0.012) (0.0024, 0.0081)

S

2 0.0 (1.0*0.3,0.0*0.3 )=(0.3,0.0)

(0.04,0.0 6)

(0.027,0.018) (0.0048,0.005 4)

Note: Every cell records the winning probability ending in that state Final winner The bold faced values in each cell shows the

sequence probability ending in that state. Going backward from final winner sequence which ends in state S2 (indicated By the 2^nd tuple), we recover the sequence.

CSE Dept., CSE Dept., IIT Bombay

CS621: Artificial Intelligence

Pushpak Bhattacharyya