• No results found

Pushpak Bhattacharyya

N/A
N/A
Protected

Academic year: 2022

Share "Pushpak Bhattacharyya"

Copied!
27
0
0

Loading.... (view fulltext now)

Full text

(1)

CS344: Introduction to Artificial Intelligence

(associated lab: CS386)

Pushpak Bhattacharyya

CSE Dept., IIT Bombay

Lecture 9: Viterbi; forward and backward probabilities

25

th

Jan, 2011

(2)

HMM Definition

Set of states: S where |S|=N

Start state S

0

/*P(S

0

)=1*/

Output Alphabet: O where |O|=M

Transition Probabilities: A= {a

ij

} /*state i to state j*/

ij

state j*/

Emission Probabilities : B= {b

j

(o

k

)} /*prob. of emitting or absorbing o

k

from state j*/

Initial State Probabilities: Π={p

1

,p

2

,p

3

,…p

N

}

Each p

i

=P(o

0

=ε,S

i

|S

0

)

(3)

Markov Processes

Properties

Limited Horizon: Given previous t states, a state i, is independent of preceding 0 to t- state i, is independent of preceding 0 to t- k+1 states.

P(X

t

=i|X

t-1

, X

t-2

,… X

0

) = P(X

t

=i|X

t-1

, X

t-2

… X

t-k

)

Order k Markov process

Time invariance: (shown for k=1)

P(X

t

=i|X

t-1

=j) = P(X

1

=i|X

0

=j) …= P(X

n

=i|X

n-1

=j)

(4)

Three basic problems (contd.)

Problem 1: Likelihood of a sequence

Forward Procedure

Backward Procedure

Backward Procedure

Problem 2: Best state sequence

Viterbi Algorithm

Problem 3: Re-estimation

Baum-Welch ( Forward-Backward

Algorithm )

(5)

Probabilistic Inference

O: Observation Sequence

S: State Sequence

Given O find S

*

where S

*

= arg max ( / p S O ) called

Given O find S

*

where called

Probabilistic Inference

Infer “Hidden” from “Observed”

How is this inference different from logical inference based on propositional or predicate calculus?

*

arg max ( / )

S

S = p S O

(6)

Essentials of Hidden Markov Model

1. Markov + Naive Bayes

2. Uses both transition and observation probability

3. Effectively makes Hidden Markov Model a Finite State Machine (FSM) with probability

1 1

(

k Ok k

) (

k

/

k

) (

k

/

k

)

p SS

+

= p O S p S

+

S

(7)

Probability of Observation Sequence

( ) ( , )

= ( ) ( / )

S

p O p O S

p S p O S

= ∑

Without any restriction,

Search space size= |S|

|O|

= ( ) ( / )

S

p S p O S

(8)

Continuing with the Urn example

Urn 1 Urn 2 Urn 3

Colored Ball choosing

Urn 1

# of Red = 30

# of Green = 50

# of Blue = 20

Urn 3

# of Red =60

# of Green =10

# of Blue = 30 Urn 2

# of Red = 10

# of Green = 40

# of Blue = 50

(9)

Example (contd.)

U

1

U

2

U

3

U

1

0.1 0.4 0.5 U

2

0.6 0.2 0.2 U

3

0.3 0.4 0.3

Given :

Observation : RRGGBRGR

and

R G B

U

1

0.3 0.5 0.2 U

2

0.1 0.4 0.5 U

3

0.6 0.1 0.3

Transition Probability Observation/output Probability

Observation : RRGGBRGR

What is the corresponding state sequence ?

(10)

Diagrammatic representation (1/2)

U U

0.1

0.3 0.3

R, 0.6 B, 0.2

R, 0.3 G, 0.5

U1

U2

U3 0.1

0.2

0.4 0.6

0.4

0.5

0.2

R, 0.6 G, 0.1 B, 0.3

R, 0.1

B, 0.5 G, 0.4

(11)

Diagrammatic representation (2/2)

U R,0.15 U

R,0.18 G,0.03 B,0.09

R,0.18 R,0.03

G,0.05 B,0.02

U1

U2

U3

R,0.02 G,0.08 B,0.10

R,0.24 G,0.04 B,0.12 R,0.06

G,0.24 B,0.30 R, 0.08

G, 0.20 B, 0.12

R,0.15 G,0.25 B,0.10

R,0.18 G,0.03 B,0.09

R,0.02 G,0.08 B,0.10

(12)

Probabilistic FSM

(a1:0.3)

(a2:0.4)

(a1:0.1) (a1:0.3)

S

1

S

2 (a1:0.2)

(a2:0.3)

(a2:0.2) (a2:0.2)

The question here is:

“what is the most likely state sequence given the output sequence seen”

S

1

S

2

(13)

Developing the tree

Start

S1 S2

S1 S2 S1 S2

1.0 0.0

0.1 0.3 0.2 0.3

1*0.1=0.1 . 0.3 . 0.0 0.0

a

1

S1 S2 S1 S2

S1 S2 S1 S2

1*0.1=0.1 0.3 0.0 0.0

0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06

. .

a

2

Choose the winning sequence per state per iteration

0.2 0.4 0.3 0.2

(14)

Tree structure contd…

S1 S2

S1 S2 S1 S2

0.1 0.3 0.2 0.3

0.027 . 0.012 .

0.09 0.06

0.09*0.1=0.009 0.018

a

1

S1

0.3

0.0081

S2 0.2

0.0054

S2 0.4

0.0048 S1

0.2

0.0024

.

a

2

The problem being addressed by this tree is S * arg max P ( S | a

1

a

2

a

1

a

2,µ

)

s

=

a1-a2-a1-a2 is the output sequence and µ the model or the machine

(15)

Tabular representation of the tree

€ a

1

a

2

a

1

a

2

S

1.0 (1.0*0.1,0.0*0.2 (0.02, (0.009, 0.012) (0.0024, Ending state

Latest symbol observed

S

1 1.0 (1.0*0.1,0.0*0.2 )=(0.1,0.0)

(0.02, 0.09)

(0.009, 0.012) (0.0024, 0.0081)

S

2 0.0 (1.0*0.3,0.0*0.3 )=(0.3,0.0)

(0.04,0.0 6)

(0.027,0.018) (0.0048,0.005 4)

Note: Every cell records the winning probability ending in that state Final winner The bold faced values in each cell shows the

sequence probability ending in that state. Going backward from final winner sequence which ends in state S2 (indicated By the 2nd tuple), we recover the sequence.

(16)

Algorithm

(following James Alan, Natural Language Understanding (2

nd

edition), Benjamin Cummins (pub.), 1995

Given:

1.

The HMM, which means:

a. Start State: S1

b. Alphabet: A = {a1, a2, … ap} Set of States: S = {S , S , … S }

c. Set of States: S = {S1, S2, … Sn}

d. Transition probability which is equal to

2.

The output string a

1

a

2

…a

T

To find:

The most likely sequence of states C

1

C

2

…C

T

which produces the given output sequence, i.e. , C

1

C

2

…C

T

=

k j i k j

i a S

S

P(  → ) , ,

)

| ,

(S j a k S i P

] , ,...

,

| ( [ max

arg 1 2 T µ

C

a a a C P

(17)

Algorithm contd…

Data Structure:

1.

A N*T array called SEQSCORE to maintain the

winner sequence always (N=#states, T=length of o/p sequence)

2.

Another N*T array called BACKPTR to recover the

2.

Another N*T array called BACKPTR to recover the path.

Three distinct steps in the Viterbi implementation

1.

Initialization

2.

Iteration

3.

Sequence Identification

(18)

1. Initialization

SEQSCORE(1,1)=1.0 BACKPTR(1,1)=0

For(i=2 to N) do

SEQSCORE(i,1)=0.0

[

expressing the fact that first state is S1

]

2. Iteration

For(t=2 to T) do For(i=1 to N) do

SEQSCORE(i,t) = Max

(j=1,N)

BACKPTR(I,t) = index j that gives the MAX above

)]

(

* )) 1 (

, (

[ SEQSCORE j tP Sj   → a

k

Si

(19)

3. Seq. Identification

C(T) = i that maximizes SEQSCORE(i,T) For i from (T-1) to 1 do

C(i) = BACKPTR[C(i+1),(i+1)]

Optimizations possible:

Optimizations possible:

1. BACKPTR can be 1*T 2. SEQSCORE can be T*2

Homework:- Compare this with A*, Beam Search [Homework]

Reason for this comparison:

Both of them work for finding and recovering sequence

(20)

Viterbi Algorithm for the Urn problem (first two symbols)

S0

U U U

0.5

0.3

0.2 ε

U1 U2 U3

U1 U2 U3

0.03

0.08

0.15

U1 U2 U3 U1 U2 U3

0.06

0.02

0.02

0.18

0.24

0.18

0.015 0.04 0.075* 0.018 0.006 0.006 0.048* 0.036

*: winner sequences R

(21)

Markov process of order>1 (say 2)

Same theory works P(S).P(O|S)

= P(O

0

|S

0

).P(S

1

|S

0

).

We introduce the states S

0

and S

9

as initial and final states

O

0

O

1

O

2

O

3

O

4

O

5

O

6

O

7

O

8

Obs:

ε R R G G B R G R

State:

S

0

S

1

S

2

S

3

S

4

S

5

S

6

S

7

S

8

S

9

= P(O

0

|S

0

).P(S

1

|S

0

).

[P(O

1

|S

1

). P(S

2

|S

1

S

0

)].

[P(O

2

|S

2

). P(S

3

|S

2

S

1

)].

[P(O

3

|S

3

).P(S

4

|S

3

S

2

)].

[P(O

4

|S

4

).P(S

5

|S

4

S

3

)].

[P(O

5

|S

5

).P(S

6

|S

5

S

4

)].

[P(O

6

|S

6

).P(S

7

|S

6

S

5

)].

[P(O

7

|S

7

).P(S

8

|S

7

S

6

)].

[P(O

8

|S

8

).P(S

9

|S

8

S

7

)].

and final states respectively.

After S

8

the next state

is S

9

with probability

1, i.e., P(S

9

|S

8

S

7

)=1

O

0

is ε-transition

(22)

Adjustments

Transition probability table will have tuples on rows and states on columns

Output probability table will remain the same

In the Viterbi tree, the Markov process will take effect from the 3

rd

input symbol (εRR) take effect from the 3 input symbol (εRR)

There will be 27 leaves, out of which only 9 will remain

Sequences ending in same tuples will be compared

Instead of U1, U2 and U3

U

1

U

1

, U

1

U

2

, U

1

U

3

, U

2

U

1

, U

2

U

2

,U

2

U

3

, U

3

U

1

,U

3

U

2

,U

3

U

3

(23)

Forward and Backward

Probability Calculation

(24)

Forward probability F(k,i)

Define F(k,i)= Probability of being in state S i having seen o 0 o 1 o 2 …o k

F(k,i)=P(o 0 o 1 o 2 …o k , S i )

With m as the length of the observed

With m as the length of the observed sequence

P(observed sequence)=P(o 0 o 1 o 2 ..o m )

p=0,N P(o 0 o 1 o 2 ..o m , S p )

p=0,N F(m , p)

(25)

Forward probability (contd.)

F(k , q)

= P(o

0

o

1

o

2

..o

k

, S

q

)

= P(o

0

o

1

o

2

..o

k

, S

q

)

= P(o

0

o

1

o

2

..o

k-1

, o

k

,S

q

)

= Σ

p=0,N

P(o

0

o

1

o

2

..o

k-1

, S

p

, o

k

,S

q

)

= Σ

p=0,N

P(o

0

o

1

o

2

..o

k-1

, S

p

).

= Σ

p=0,N

P(o

0

o

1

o

2

..o

k-1

, S

p

).

P(o

m

,S

q

|o

0

o

1

o

2

..o

k-1

, S

p

)

= Σ

p=0,N

F(k-1,p). P(o

k

,S

q

|S

p

)

= Σ

p=0,N

F(k-1,p). P(S

p

S

q

)

ok

O

0

O

1

O

2

O

3

… O

k

O

k+1

… O

m-1

O

m

S

0

S

1

S

2

S

3

… S

p

S

q

… S

m

S

final

(26)

Backward probability B(k,i)

Define B(k,i)= Probability of seeing

o k o k+1 o k+2 …o m given that the state was S i

B(k,i)=P(o k k k+1 k+2 o k+1 o k+2 …o m m \ S i i )

With m as the length of the observed sequence

P(observed sequence)=P(o 0 o 1 o 2 ..o m )

= P(o 0 o 1 o 2 ..o m | S 0 )

=B(0,0)

(27)

Backward probability (contd.)

B(k , p)

= P(o

k

o

k+1

o

k+2

…o

m

\ S

p

)

= P(o

k+1

o

k+2

…o

m

, o

k

|S

p

)

= Σ

q=0,N

P(o

k+1

o

k+2

…o

m

, o

k

, S

q

|S

p

)

= Σ

q=0,N

P(o

k

,S

q

|S

p

)

P(o

k+1

o

k+2

…o

m

|o

k

,S

q

,S

p

)

= Σ

q=0,N

P(o

k+1

o

k+2

…o

m

|S

q

). P(o

k

, S

q

|S

p

)

= Σ

q=0,N

B(k+1,q). P(S

p

S

q

)

ok

O

0

O

1

O

2

O

3

… O

k

O

k+1

… O

m-1

O

m

S

0

S

1

S

2

S

3

… S

p

S

q

… S

m

S

final

References

Related documents

Going backward from final winner sequence which ends in state S2 (indicated By the 2 nd tuple), we recover the sequence..2. A N*T array called SEQSCORE to

While raising the investment limit on the basis of some valid and generally admissible criteria, other factors like the number of employees in the enterprises and the turnover,

Integrated land-use planning at national and subnational level, carried out in consultation with relevant stakeholders, is another crucial requirement and should include scenario

„ One day, Sam left his small, yellow home to head towards the meat-packing plant where he worked, a task which was never completed, as on his way, he tripped, fell, and went

„ E: advise; H: paraamarsh denaa (advice give): Noun Incorporation- very common Indian Language Phenomenon. Incorporation very common Indian

An encoder-decoder model includes an encoder, which reads in the input (grapheme) sequence, and a decoder, which generates the output (phoneme) sequence.. A typ- ical

— The object at any index in tuple cannot be changed but if the referred object is mutable (Like a list), then, content of the mutable object can be changed within the tuple.... T

ash -- (strong elastic wood of any of various ash trees; used for furniture and tool handles and sporting goods such as baseball bats). „ The verb ash has 1 sense (no senses