• No results found

CS626-460: Speech, NLP and the p , Web

N/A
N/A
Protected

Academic year: 2022

Share "CS626-460: Speech, NLP and the p , Web"

Copied!
30
0
0

Loading.... (view fulltext now)

Full text

(1)

CS626-460: Speech, NLP and the p , Web

Pushpak Bhattacharyya

CSE Dept., IIT Bombay

Lecture 5,7: HMM and Viterbi 10

th

and 16

th

Jan, 2012

(Lecture 6 was on Computational Biomedicine research at Houston University by Prof. Ioannis)

(2)

HMM Definition

Problem

Part of Speech Parsing

Semantics NLP Trinity

„ Set of states : S where |S|=N

„ Output Alphabet : O where |O|=K Hindi

Marathi English

French Morph

Analysis

p Tagging

HMM

„ Transition Probabilities : A = {aij}

„ prob. of going from state Si to state Sj

E i i P b biliti B {b }

Algorithm

Language CRF MEMM

„ Emission Probabilities : B = {bpq}

„ prob. of outputting symbol Oq from state Sp

„ Initial State Probabilities : πt a State obab t es

) ,

,

( π

λ = ( A , B , π )

λ A B

(3)

Markov Processes

„

Properties

„

Limited Horizon: Given previous t states a

„

Limited Horizon: Given previous t states, a state i, is independent of preceding 0 to t- k+1 states .

„ P(Xt=i|Xt-1, Xt-2 ,… X0) = P(Xt=i|Xt-1, Xt-2… Xt-k)

„ Order k Markov process

„

Time invariance: (shown for k=1 )

„ P(Xt=i|Xt-1=j) = P(X1=i|X0=j) …= P(Xn=i|Xn-1=j)

(4)

Three basic problems (contd.)

„

Problem 1: Likelihood of a sequence

„

Forward Procedure

„

Forward Procedure

„

Backward Procedure

Problem 2: Best state sequence

„

Problem 2: Best state sequence

„

Viterbi Algorithm

P bl 3 R ti ti

„

Problem 3: Re-estimation

„

Baum-Welch ( Forward-Backward Al ith )

Algorithm )

(5)

Probabilistic Inference

„ O: Observation Sequence

„ S: State Sequence

„ Given O find S* where called

Probabilistic Inference

* arg max ( / )

S

S = p S O

Probabilistic Inference

„ Infer “Hidden” from “Observed”

S

„ Infer Hidden from Observed

„ How is this inference different from logical inference based on propositional or predicate calculus?

(6)

Essentials of Hidden Markov Model

1. Markov + Naive Bayes

2 Uses both transition and observation probability 2. Uses both transition and observation probability

1 1

(

k Ok k

) (

k

/

k

) (

k

/

k

) p SS

+

= p O S p S

+

S

3. Effectively makes Hidden Markov Model a Finite State

1 1

(

k k

) (

k k

) (

k k

)

p

+

p p

+

y

Machine (FSM) with probability

(7)

Probability of Observation Sequence

( ) ( , )

S

p O = ∑ p O S

= ( ) ( / )

S

S

p S p O S

„

Without any restriction,

„

Search space size= |S|

|O|

(8)

Continuing with the Urn example

Colored Ball choosing

Urn 1

# of Red = 30

# of Green = 50

Urn 3

# of Red =60

# of Green =10 Urn 2

# of Red = 10

# of Green 40

# of Green 50

# of Blue = 20 # of Green 10

# of Blue = 30

# of Green = 40

# of Blue = 50

(9)

Example (contd.)

U1 U2 U3

Gi

R G B

U 0 3 0 5 0 2

Transition Probability Observation/output Probability

U1 0.1 0.4 0.5 U2 0.6 0.2 0.2 U 0 3 0 4 0 3

Given :

and U1 0.3 0.5 0.2

U2 0.1 0.4 0.5 U3 0.6 0.1 0.3 U3 0.3 0.4 0.3

Observation : RRGGBRGR

U3 0.6 0.1 0.3

What is the corresponding state sequence ?

(10)

Diagrammatic representation (1/2)

G 0 5

0 3 0 3

B, 0.2 R, 0.3 G, 0.5

U1 U3

0.1

0.5

0.3 0.3

R, 0.6 0.2

0.4 0.6

0.4

G, 0.1 B, 0.3

U2 R, 0.1

G, 0.4

0.2 B, 0.5

(11)

Diagrammatic representation (2/2) g p ( / )

R,0.18 G,0.03 R,0.03

G,0.05 B,0.02

U1 U3

R 0 02 R,0.15

G,0.25 B,0.10

B,0.09

R,0.18 G,0.03 B,0.09 R,0.02

G,0.08 B,0.10

R,0.24 G 0 04 R,0.06

G,0.24 B,0.30 R, 0.08

G 0 20

B,0.10 B,0.09

U2

G,0.04 B,0.12 G, 0.20

B, 0.12

R,0.02, G,0.08 B,0.10

(12)

Observations and states

O

1

O

2

O

3

O

4

O

5

O

6

O

7

O

8

O S G G G

OBS: R R G G B R G R State: S

1

S

2

S

3

S

4

S

5

S

6

S

7

S

8

S

ii

= U

11

/U /

22

/U /

33

; A particular state ; p S: State sequence

O: Observation sequence O: Observation sequence

S* = “best” possible state (urn) sequence

Goal: Maximize P(S|O) by choosing “best” S

Goal: Maximize P(S|O) by choosing best S

(13)

Goal

„

Maximize P(S|O) where S is the State Sequence and O is the Observation

Sequence and O is the Observation Sequence

))

| (

( max

arg

* P S O

S =

S

(14)

Baye’s Theorem

) (

/ )

| (

).

( )

|

( A B P A P B A P B

P =

P(A) -: Prior

P(B|A) -: Likelihood P(B|A) : Likelihood

)

| (

).

( max

arg )

| ( max

arg max

S

P ( S | O ) = arg max

S

P ( S ). P ( O | S )

arg

S

P S O

S

P S P O S

(15)

State Transitions Probability

)

| ( )...

| ( ).

| ( ).

| ( ).

( )

(

) (

) (

7 1 8 3

1 4 2

1 3 1

2 1

8 1

=

=

S S P S

S P S

S P S

S P S

P S

P

S P S

P

)

| ( )

| ( )

| ( )

| ( ) ( )

(

By Markov Assumption (k=1)

)

| ( )...

| ( ).

| ( ).

| ( ).

( )

( S P S

1

P S

2

S

1

P S

3

S

2

P S

4

S

3

P S

8

S

7

P =

(16)

Observation Sequence probability

) ,

| ( )...

,

| ( ).

,

| ( ).

| ( )

|

( O S = P O

1

S

18

P O

2

O

1

S

18

P O

3

O

12

S

18

P O

8

O

17

S

18

P

Assumption that ball drawn depends only Assumption that ball drawn depends only on the Urn chosen

)

| (

)

| (

)

| (

)

| (

)

|

( O S P O S P O S P O S P O S

P ( O | S ) P ( O

1

| S

1

). P ( O

2

| S

2

). P ( O

3

| S

3

)... P ( O

8

| S

8

)

P =

)

| ( ).

( )

|

( S O P S P O S

P =

)

| (

)...

| (

).

| (

).

| (

).

| (

)...

| (

).

| (

).

| (

).

( )

| (

8 8

3 3

2 2

1 1

7 8

3 4

2 3

1 2

1

S O

P S

O P S

O P S

O P

S S

P S

S P S

S P S

S P S

P O

S

P =

)

| (

)

| (

)

| (

)

|

(

(17)

Grouping terms

O0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G R

S S S S S S S S S S

P(S).P(O|S)

= [P(O |S ) P(S |S )] We introduce the states S d S i i i l

State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

= [P(O0|S0).P(S1|S0)].

[P(O1|S1). P(S2|S1)].

[P(O2|S2). P(S3|S2)].

S0 and S9 as initial and final states respectively.

[P(O3|S3).P(S4|S3)].

[P(O4|S4).P(S5|S4)].

[P(O5|S5).P(S6|S5)].

p y

After S8 the next state is S9 with probability 1 i e P(S |S ) 1

[ ( 5| 5) ( 6| 5)]

[P(O6|S6).P(S7|S6)].

[P(O7|S7).P(S8|S7)].

[P(O |S ) P(S |S )]

1, i.e., P(S9|S8)=1 O0 is ε-transition

[P(O8|S8).P(S9|S8)].

(18)

Introducing useful notation

O0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G R

S S S S S S S S S S

State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

R G G B R

S0 S1 S7

S2 S3 S4 S5 S6

ε R R G G B R

G S8

O R

S9

P(Ok|Sk).P(Sk+1|Sk)=P(SkÆSOk k+1)

(19)

Viterbi Algorithm for the Urn problem (first two symbols)

S0 0.5

0 3

0.2 ε

U1 U2 U3

0.3

0 15 0.03

0 08

0.15

0.06

0.02

0.18 0.18

R

U1 U2 U3

0.08

U1 U2 U3 U1 U2 U3

0.02 0.24

0.015 0.04 0.075* 0.018 0.006 0.006 0.048* 0.036

*: winner sequences

(20)

Markov process of order>1 (say 2)

O0 O1 O2 O3 O4 O5 O6 O7 O8

Obs: ε R R G G B R G R

S S S S S S S S S S

Same theory works

P(S) P(O|S) We introduce the states

S d S i i i l

State: S0 S1 S2 S3 S4 S5 S6 S7 S8 S9

P(S).P(O|S)

= P(O0|S0).P(S1|S0).

[P(O1|S1). P(S2|S1S0)].

S0 and S9 as initial and final states respectively.

[P(O2|S2). P(S3|S2S1)].

[P(O3|S3).P(S4|S3S2)].

[P(O4|S4).P(S5|S4S3)].

p y

After S8 the next state is S9 with probability 1 i e P(S |S S ) 1

[ ( 4| 4) ( 5| 4 3)]

[P(O5|S5).P(S6|S5S4)].

[P(O6|S6).P(S7|S6S5)].

[P(O |S ) P(S |S S )]

1, i.e., P(S9|S8S7)=1 O0 is ε-transition

[P(O7|S7).P(S8|S7S6)].

[P(O8|S8).P(S9|S8S7)].

(21)

Adjustments

„

Transition probability table will have tuples on rows and states on columns

rows and states on columns

„

Output probability table will remain the same

„

In the Viterbi tree the Markov process will

„

In the Viterbi tree, the Markov process will take effect from the 3

rd

input symbol (εRR)

„

There will be 27 leaves, out of which There will be 27 leaves, out of which only 9 only 9 will remain

„

Sequences ending in same tuples q g p will be compared

„ Instead of U1, U2 and U3

„ U1U1, U1U2, U1U3, U2U1, U2U2,U2U3, U3U1,U3U2,U3U3

(22)

Probabilistic FSM Probabilistic FSM

(a1:0.3)

(a2:0.4)

(a1:0.1) (a2:0.4) (a1:0.3)

(a1:0 2) (a1:0.1)

(a2:0 2)

(a1:0.3)

(a2:0 2)

S1 S2

(a1:0.2)

(a2:0.3)

(a2:0.2) (a2:0.2)

The question here is:

“what is the most likely state sequence given the output sequence seen”

(23)

Developing the tree Developing the tree

Start

1 0 0 0

S1 S2

1.0 0.0

0.1 0.3 0.2 0.3

a1

S1 S2 S1 S2

0.1 0.3 0.2 0.3

1*0.1=0.1 9. 0.3 9. 0.0 0.0

a1

0 2 0 2

S1 S2 S1 S2

9. 9.

a2

0.2 0.4 0.3 0.2

0.1*0.2=0.02 0.1*0.4=0.04 0.3*0.3=0.09 0.3*0.2=0.06 Choose  the  winning  sequence per state per iteration

(24)

Tree structure contd

Tree structure contd…

S1 S2

0.09 0.06

S1 S2

S1 S2 S1 S2

0.1 0.3 0.2 0.3

0.027 9. 0.012 9.

0.09*0.1=0.009 0.018

a1

0.3 0.2 0.2 0.4 a2

S1 S2 S2

0 00 8 9. S1

0.0081 0.0054 0.0024 0.0048

Th bl b i dd d b thi t i S* = argmax P(S | a1 a2 a1 a2 )

The problem being addressed by this tree is S* argmax P(S | a1 a2 a1 a2,μ)

s

=

a1-a2-a1-a2 is the output sequence and μ the model or the machine

(25)

P th f d

S1 S2 S1 S2 S1

Path found

:

(working backward)

S1 S2 S1 S2 S1 a2

a1 a1 a2

Problem statement

: Find the best possible sequence

) ,

| max (

* arg P S O μ S = arg max P ( S | O , μ ) S

s

Machine or

Model Seq,

Output Seq,

State

, S O μ

where

} , , , { Machine

or

Model = S

0

S A T

Start symbol State collection Alphabet  set

Transitions

T is defined as P ( S

i

⎯ ⎯→ a

k

S

j

) ∀

i, j, k

(26)

Tabular representation of the tree

€ a1 a2 a1 a2

Latest symbol  observed

S1 1.0 (1.0*0.1,0.0*0.2

)=(0.1,0.0) (0.02,

0.09) (0.009, 0.012) (0.0024, 0.0081)

Ending state

S2 0.0 (1.0*0.3,0.0*0.3

)=(0.3,0.0) (0.04,0.0

6) (0.027,0.018) (0.0048,0.005 4)

Note: Every cell records the winning probability ending in that state Note: Every cell records the winning probability ending in that state Final winner The bold faced values in each cell shows the

sequence probability ending in that state Going backward sequence probability ending in that state. Going backward from final winner sequence which ends in state S2 (indicated By the 2nd tuple), we recover the sequence.

(27)

Algorithm

(f ll l l d d

(following James Alan, Natural Language Understanding (2

nd

edition), Benjamin Cummins (pub.), 1995

Given:

1. The HMM, which means:

a Start State: S1

a. Start State: S1

b. Alphabet: A = {a1, a2, … ap}

c. Set of States: S = {S1, S2, … Sn}

d. Transition probabilityp y P(Si ⎯ →ak Sj) i, j,k which is equal to

2. The output string a1a2…aT

)

| ,

(S j a k S i P

To find:

The most likely sequence of states C1C2…CT which produces the given output sequence, i.e., C1C2…CT = argmax[ ( | 1, 2,... T,μ]

C

a a a C P

C

(28)

Algorithm contd…

Data Structure: ata St uctu e

1. A N*T array called SEQSCORE to maintain the

winner sequence always (N=#states, T=length of o/p sequence)

o/p sequence)

2. Another N*T array called BACKPTR to recover the path.

Three distinct steps in the Viterbi implementation

Initialization

1. Initialization

2. Iteration

3. Sequence Identificationq

(29)

1. Initialization

SEQSCORE(1,1)=1.0 BACKPTR(1,1)=0

For(i=2 to N) do

SEQSCORE(i,1)=0.0

[expressing the fact that first state [expressing the fact that first state

is S1]

2 Iteration 2. Iteration

For(t=2 to T) do For(i 1 to N) do For(i=1 to N) do

SEQSCORE(i,t) = Max(j=1,N)

)]

(

* )) 1 (

(

[SEQSCORE j t P Sj ⎯ →ak Si

BACKPTR(I,t) = index j that gives the MAX above

)]

( ))

1 (

, (

[SEQSCORE j t P Sj Si

(30)

3. Seq. Identification q

C(T) = i that maximizes SEQSCORE(i,T) For i from (T-1) to 1 do

For i from (T 1) to 1 do

C(i) = BACKPTR[C(i+1),(i+1)]

Optimizations possible:

1. BACKPTR can be 1*T 2. SEQSCORE can be T*2

Homework:‐ Compare this with A*, Beam Search [Homework]

Reason for this comparison: 

Both of them work for finding and recovering sequence Both of them work for finding and recovering sequence

References

Related documents

Going backward from final winner sequence which ends in state S 2 (indicated By the 2 nd tuple), we recover the sequence... The HMM,

Going backward from final winner sequence which ends in state S 2 (indicated By the 2 nd tuple), we recover the sequence... The HMM,

Going backward from final winner sequence which ends in state S2 (indicated By the 2 nd tuple), we recover the sequence... The HMM,

Going backward from final winner sequence which ends in state S2 (indicated By the 2 nd tuple), we recover

Going backward from final winner sequence which ends in state S2 (indicated By the 2 nd tuple), we recover the sequence..2. The HMM,

Structural difference between complements and adjuncts complements and adjuncts.

animacy feature, then it is the likely agent of the action denoted by the verb.. denoted by

 One day, Sam left his small, yellow home to head towards the meat-packing plant where he worked, a task which was never completed, as on his way, he tripped, fell, and went