• No results found

Hidden Markov Models

N/A
N/A
Protected

Academic year: 2022

Share "Hidden Markov Models"

Copied!
36
0
0

Loading.... (view fulltext now)

Full text

(1)

Hidden Markov Models

By

Manish Shrivastava

(2)

A Simple Process

a b

a S1 b

S2

A simple automata

(3)

A Slightly Complicated Process

Urn 1

# of Red = 100

# of Green = 0

# of Blue = 0

Urn 3

# of Red = 0

# of Green = 0

# of Blue = 100 Urn 2

# of Red = 0

# of Green = 100

# of Blue = 0

A colored ball choosing example :

U1 U2 U3

U1 0.1 0.4 0.5

U2 0.6 0.2 0.2

U3 0.3 0.4 0.3

Probability of transition to another Urn after picking a ball:

(4)

A Slightly Complicated Process contd.

U1 U2 U3

U1 0.1 0.4 0.5

U2 0.6 0.2 0.2

U3 0.3 0.4 0.3

Given :

Observation : RRGGBRGR

State Sequence : ??

Easily Computable.

(5)

Markov Processes

Properties

Limited Horizon :Given previous n states, a state i, is independent of preceding 0…i-n+1 states.

P(Xt=i|Xt-1, Xt-2 ,… X0) = P(Xt=i|Xt-1, Xt-2… Xt-n)

Time invariance :

P(Xt=i|Xt-1=j) = P(X1=i|X0=j) = P(Xn=i|X0-1=j)

(6)

A (Slightly Complicated) Markov Process

Urn 1

# of Red = 100

# of Green = 0

# of Blue = 0

Urn 3

# of Red = 0

# of Green = 0

# of Blue = 100 Urn 2

# of Red = 0

# of Green = 100

# of Blue = 0

A colored ball choosing example :

U1 U2 U3

U1 0.1 0.4 0.5

U2 0.6 0.2 0.2

U3 0.3 0.4 0.3

Probability of transition to another Urn after picking a ball:

(7)

Markov Process

Visible Markov Model

Given the observation, one can easily follow the state sequence traversed

1 2 3

P(3|1) P(1|3)

(8)

POS Tagging Basics

Assign “Part of Speech” to each word in a given sentence

Example:

This/DT is/VBZ a/DT sentence/NN ./.

This/DT is/VBZ a/DT tagged/JJ sentence/NN

./.

(9)

Simple Method

Assign the most common tag

Example :

I/PRP bank/NN at/IN SBI/NNP ./SYM

But, the correct tag sequence in context is:

I/PRP bank/VBP at/IN SBI/NNP ./SYM

Assign “Part of Speech” to each word

according to its context in a given sentence

(10)

Mathematics of POS Tagging

Formally,

POS Tagging is a sequence labeling task

For a given observation sequence W

W: {w1,w2 … wn}

Produce a label/tag sequence T

T: {t1,t2 … tn}

Such that they “belong” together

Maximize P(W,T) or P(T|W)

argmax P(T|W)

(11)

Computing Label sequence given Observation Sequence

(12)

Hidden Markov Model

Urn 1

# of Red = 30

# of Green = 50

# of Blue = 20

Urn 3

# of Red =60

# of Green =10

# of Blue = 30 Urn 2

# of Red = 10

# of Green = 40

# of Blue = 50

A colored ball choosing example :

U1 U2 U3

U1 0.1 0.4 0.5

U2 0.6 0.2 0.2

U3 0.3 0.4 0.3

Probability of transition to another Urn after picking a ball:

(13)

Hidden Markov Model

U1 U2 U3

U1 0.1 0.4 0.5 U2 0.6 0.2 0.2 U3 0.3 0.4 0.3 Given :

Observation : RRGGBRGR

State Sequence : ??

Not So Easily Computable.

and

R G B

U1 0.3 0.5 0.2 U2 0.1 0.4 0.5 U3 0.6 0.1 0.3

(14)

Hidden Markov Model

Set of states : S

Output Alphabet : V

Transition Probabilities : A = {aij}

Emission Probabilities : B = {bj(ok)}

Initial State Probabilities : π

) ,

,

(

A B

(15)

Hidden Markov Model

Here :

S = {U1, U2, U3}

V = { R,G,B}

For observation:

O ={o1… on}

And State sequence

Q ={q1… qn}

π is

U1 U2 U3

U1 0.1 0.4 0.5

U2 0.6 0.2 0.2

U3 0.3 0.4 0.3

R G B

U1 0.3 0.5 0.2

U2 0.1 0.4 0.5

U3 0.6 0.1 0.3

A =

B=

)

( 1 i

i P q U

(16)

Three Basic Problems of HMM

1. Given Observation Sequence O ={o1… on}

Efficiently estimate

2. Given Observation Sequence O ={o1… on}

Get best Q ={q1… qn}

3. How to adjust to best maximize

)

| (O P

) , ,

(

A B P(O |)

(17)

Solutions

Problem 1: Likelihood of a sequence

Forward Procedure

Backward Procedure

Problem 2: Best state sequence

Viterbi Algorithm

Problem 3: Re-estimation

Baum-Welch ( Forward-Backward Algorithm )

(18)

Problem 1

T T

q q q

q q

T q

q T

t

t t

Q P Q

O P O

P

Q O P O

P

a a

Q P

o b

o b

q o

P Q

O P

T T

T

)

| ( ) ,

| ( )

| ( Then,

)

| , ( )

| (

know, We

....

)

| ( And

) ( ...

).

(

) ,

| ( )

,

| (

, Then

} ...q {q

Q And

} ...o {o

O : Consider

1 2

1 1

1 1

1 T 1

T 1

(19)

Problem 1

T

T T

T

q q

T q

q q

q q

q

q a a b o b o

O P

...

1

1

1 1

2 1

1 .... ( ). ... ( )

)

|

(

Order 2TNT

Definitely not efficient!!

Is there a method to tackle this problem? Yes.

Forward or Backward Procedure

(20)

Forward Procedure

model given the

, ...

n observatio partial

the of and

, is position at

state y that the

probabilit The

)

| ,

...

( )

(

as variable Forward

Define

1 1

t

i i

t t t

o o

S t S

q o o P

i

Forward Step:

(21)

Forward Procedure

(22)

Backward Procedure

(23)

Backward Procedure

(24)

Forward Backward Procedure

Benefit

Order

N2T as compared to 2TNT for simple computation

Only Forward or Backward procedure needed

for Problem 1

(25)

Problem 2

Given Observation Sequence O ={o

1

… o

T

}

Get “best” Q ={q1… qT} i.e.

Solution :

1. Best state individually likely at a position i

2. Best state given all the previously observed states and observations

Viterbi Algorithm

(26)

Viterbi Algorithm

• Define such that,

i.e. the sequence which has the best joint probability so far.

• By induction, we have,

(27)

Viterbi Algorithm

(28)

Viterbi Algorithm

(29)

Problem 3

How to adjust to best maximize

Re-estimate λ

Solutions :

To re-estimate (iteratively update and improve) HMM parameters A,B, π

Use Baum-Welch algorithm

) , ,

(

A B P(O |)

(30)

Baum-Welch Algorithm

Define

Putting forward and backward variables

(31)

Baum-Welch algorithm

(32)

Define

Then, expected number of transitions from Si

And, expected number of transitions from Sj to Si

(33)
(34)

Baum-Welch Algorithm

Baum et al have proved that the above

equations lead to a model as good or better

than the previous

(35)

Questions?

(36)

References

Lawrence R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the

IEEE, 77 (2), p. 257–286, February 1989.

Chris Manning and Hinrich Schütze, Chapter 9: Markov Models,Foundations of Statistical Natural Language Processing, MIT Press.

Cambridge, MA: May 1999

References

Related documents

The first is an example of text segmentation where noisy text strings like addresses are segmented based on a fixed set of labels using Hidden Markov Models [4].. The second is

9.4 • D ECODING : T HE V ITERBI A LGORITHM 11 We might propose to find the best sequence as follows: For each possible hid- den state sequence (HHH, HHC, HCH, etc.), we could run

Cognitive Science Computational Models of Language Processing, Cognitive Science Computational Models of Language Processing,.

Very little knowledge; annotated data is still required DEEP LEARNING SYSTEMS.. Facets of an

● Review various statistical alignment models and heuristic models and Training of the alignment models.. ● IBM models 1-5 , HMM Model, Model 6

Cognitive Science Computational Models of Language Processing, Cognitive Science Computational Models of Language Processing,.

We want to determine the probability of an ice-cream observation sequence like 3 1 3, but we don’t know what the hidden state sequence is.. Let’s start with a slightly

Time series data on marine fish landings along southwest coast of India during 1961 to 2003 are examined by Markov chain model after deriving transition probability matrices..