• No results found

• URN 1 • URN 2 • URN 3

N/A
N/A
Protected

Academic year: 2022

Share "• URN 1 • URN 2 • URN 3"

Copied!
41
0
0

Loading.... (view fulltext now)

Full text

(1)

HMM

(2)

An Example: Choosing Colored balls from 3 urns

• URN 1 • URN 2 • URN 3

Probability of transition from one Urn to another

URN 1 URN 2 URN 3

URN 1 0.1 0.4 0.5

URN 2 0.6 0.2 0.2

URN 3 0.3 0.4 0.3

(3)

URN 1

Red=30

Green=50

Blue =20

URN 2

Red=10

Green=40

Blue=50

URN 1

Red=60

Green=10

Blue=30

Probability drawing a ball

R G B

URN 1 0.3 0.5 0.2

URN 2 0.1 0.4 0.5

URN 3 0.6 0.1 0.3

(4)

U3 U1

U2 R, 0.3 G, 0.5

B, 0.2

0.3

0.3

0.1 0.5

0.4 0.4 0.2

0.6

0.2

R, 0.6

(5)

U3 U1

U2

R, 0.03 G, 0.05 B, 0.02

R, 0.18 G, 0.03 B, 0.09

R, 0.18 G, 0.03 B, 0.09

R, 0.24 G, 0.04 B, 0.12 R, 0.02

G, 0.08 B, 0.10

R, 0.02 G, 0.08 B, 0.10 R, 0.08

G, 0.20 B, 0.12

R, 0.06 G, 0.24 B, 0.30

R, 0.15 G, 0.25 B, 0.10

(6)

Probability drawing a ball

(B)

R G B

URN 1 0.3 0.5 0.2 URN 2 0.1 0.4 0.5 URN 3 0.6 0.1 0.3

Probability of transition from one Urn to another

(A)

URN 1 URN 2 URN 3

URN 1 0.1 0.4 0.5

URN 2 0.6 0.2 0.2

URN 3 0.3 0.4 0.3

(7)

Observation and state sequence

o

1

o

2

o

3

o

4

o

5

o

6

o

7

o

8

Obs.: R R G G B R G R

Sates: q

1

q

2

q

3

q

4

q

5

q

6

q

7

q

8

q

i

= U

1

/U

2

/U

3

; any particular state

Object: To fine the best possible state sequence

that maximizes P(Q*|O) by choosing the best Q

(8)

Goal

(9)

o

1

o

2

o

3

o

4

o

5

o

6

o

7

o

8

Obs.: R R G G B R G R

Sates: q

1

q

2

q

3

q

4

q

5

q

6

q

7

q

8

(10)

Baye’s Theorem

(11)
(12)

Probability: Observation Sequence

(13)

Markov Assumption

• The Markov assumption states that probability of the occurrence of word w

i

at time t

depends only on occurrence of word w

i-1

at time t-1

– Chain rule:

– Markov assumption:



P(w1,...,wn) P(wi | wi1)

i2

n



P(w1,...,wn) P(wi | w1,...,wi1)

i2

n

(14)

The Trellis

(15)

Parameters of an HMM

• States: A set of states S=s1,…,sn

• Transition probabilities: A= a1,1,a1,2,…,an,n each ai,j

represents the probability of transitioning from state si to sj.

• Emission probabilities: A set B of functions of the form bi(ot) which is the probability of observation ot being emitted by si.

• Initial state distribution: is the probability that si is a start state.



i

(16)

The Three Basic HMM Problems

• Problem 1 (Evaluation): Given the observation sequence O=o1,…,oT and an HMM model

, how do we compute the probability of O given the model?

• Problem 2 (Decoding): Given the observation sequence O=o1,…,oT and an HMM model , how do we find the state sequence that best explains the

observations?



  ( A , B ,  )



(

A

,

B

,  )

(17)

• Problem 3 (Learning): How do we adjust the model parameters , to maximize ?

The Three Basic HMM Problems



(A,B,)



P(O | )

(18)

Problem 1: Probability of an Observation Sequence

• What is ?

• The probability of a observation sequence is the sum of the probabilities of all possible state sequences in the HMM.

• Naïve computation is very expensive. Given T

observations and N states, there are NT possible state sequences.

• Even small HMMs, e.g. T=10 and N=10, contain 10 billion different paths

• Solution to this and problem 2 is to use dynamic programming



P(O |

)

(19)

Forward Probabilities

• What is the probability that, given an HMM , at time t the state is i and the partial

observation o

1

… o

t

has been generated?



t

(i)  P (o

1

... o

t

, q

t

s

i

|  )



(20)

Forward Probabilities



t(j) t1(i)aij

i1

N



 

bj(ot)



t(i) P(o1...ot,qt si | )

(21)

Forward Algorithm

• Initialization:

• Induction:

• Termination:



t( j) t1(i)aij

i1

N



 

bj(ot) 2 t T,1 j N



1(i) 

ibi(o1) 1  iN



P(O

|  )

T

(i)

i1

N

(22)

Forward Algorithm Complexity

• In the naïve approach to solving problem 1 it takes on the order of 2T*N

T

computations

• The forward algorithm takes on the order of

N

2

T computations

(23)

Backward Probabilities

• Analogous to the forward probability, just in the other direction

• What is the probability that given an HMM and given the state at time t is i, the partial observation o

t+1

… o

T

is generated?



t

(i)  P (o

t1

...o

T

| q

t

s

i

,  )



(24)

Backward Probabilities



t(i) aijbj(ot1)t1(j)

j1

N















t(i) P(ot1...oT |qt si,)

(25)

Backward Algorithm

• Initialization:

• Induction:

• Termination:



T

(i)  1, 1  iN



t(i) aijbj(ot1)t1( j)

j1

N











 t T 1...1,1 i N



P(O

|  )

i

1

(i)

i1

N

(26)

Problem 2: Decoding

• The solution to Problem 1 (Evaluation) gives us the sum of all paths through an HMM efficiently.

• For Problem 2, we want to find the path with the highest probability.

• We want to find the state sequence Q=q1…qT, such that



Q

arg max

Q'

P

(Q'|

O

,  )

(27)

Viterbi Algorithm

• Similar to computing the forward

probabilities, but instead of summing over

transitions from incoming states, compute the maximum

• Forward:

• Viterbi Recursion:



t( j) t1(i)aij

i1

N



 

bj(ot)



t

( j )  max

1iN

t1

(i) a

ij

  b

j

(o

t

)

(28)

Viterbi Algorithm

• Initialization:

• Induction:

• Termination:

• Read out path:



1(i) 

ibj(o1) 1 iN

( )( )

max )

(

1

1 t ij j t

N

t

j

i

i a b o

N j

T t

a i

j

t ij

N i

t

   

 

 

max ( ) 2 , 1

arg )

(

1

1



p

*

 max

1iN

T

(i)



q

T*

 arg max

1iN

T

(i)



q

t*

 

t1

(q

t*1

) tT  1,...,1

0 )

1( j

(29)

Problem 3: Learning

• Up to now we’ve assumed that we know the underlying model

• Often these parameters are estimated on annotated training data, which has two drawbacks:

Annotation is difficult and/or expensive

Training data is different from the current data

• We want to maximize the parameters with respect to the current data, i.e., we’re looking for a model , such that



  ( A , B,  )



 '



' arg max

P(O | )

(30)

Problem 3: Learning

• Unfortunately, there is no known way to analytically find a global maximum, i.e., a model , such that

• But it is possible to find a local maximum

• Given an initial model , we can always find a model , such that



 '



' arg max

P(O | )





 '



P(O | ')P(O | )

(31)

Parameter Re-estimation

• Use the forward-backward (or Baum-Welch) algorithm.

• Using an initial parameter instantiation, the forward-backward algorithm iteratively re- estimates the parameters and improves the probability that given observation are

generated by the new parameters

(32)

Parameter Re-estimation

• Three parameters need to be re-estimated:

– Initial state distribution:

– Transition probabilities: ai,j – Emission probabilities: bi(ot)



i

(33)

Re-estimating Transition Probabilities

• What’s the probability of being in state s

i

at time t and going to state s

j

, given the current model and parameters?



t

(i , j )  P (q

t

s

i

, q

t1

s

j

| O ,  )

(34)

Re-estimating Transition Probabilities



t(i,j) t(i) ai,j bj(ot1) t1(j)

t(i) ai,j bj(ot1) t1(j)

j1

N i1

N



t(i, j) P(qt si, qt1 sj |O,)

(35)

Re-estimating Transition Probabilities

• The intuition behind the re-estimation equation for transition probabilities is

• Formally:



a ˆ i,j expected number of transitions from state s i to state sj expected number of transitions from state s i



a ˆ i, j

t(i, j)

t1 T1

t(i, j')

j'1

N t1 T1

(36)

Re-estimating Transition Probabilities

• Defining

As the probability of being in state s

i

, given the complete observation O

• We can say:



a ˆ i, j

t (i, j)

t1 T1

t(i)

t1 T1



t(i) t(i, j)

j1

N

(37)

Review of Probabilities

Forward probability:

The probability of being in state si, given the partial observation o1,…,ot

Backward probability:

The probability of being in state si, given the partial observation ot+1,…,oT

Transition probability:

The probability of going from state si, to state sj, given the complete observation o1,…,oT

State probability:

The probability of being in state si, given the complete observation o1,…,oT



t

(i)



t

(i)



t

(i, j )



t

(i)

(38)

Re-estimating Initial State Probabilities

• Initial state distribution: is the probability that s

i

is a start state

• Re-estimation is easy:

• Formally:



i



 ˆ

i

expected number of times in state s

i

at time 1



 ˆ

i

1

(i)

(39)

Re-estimation of Emission Probabilities

• Emission probabilities are re-estimated as

• Formally:

Where

Note that here is the Kronecker delta function and is not related to the in the discussion of the Viterbi algorithm!!



b ˆ i(k) expected number of times in state s i and observe symbol v k expected number of times in state s i



b ˆ i(k)

(ot,vk)t(i)

t1

T

t(i)

t1

T



(ot,vk) 1, if ot vk, and 0 otherwise





(40)

The Updated Model

• Coming from we get to by the following update rules:



(A,B,)



 '  ( ˆ A , ˆ B , ˆ  )



b ˆ i(k)

(ot,vk)t(i)

t1

T

t(i)

t1

T



a ˆ i, j

t (i, j)

t1 T1

t(i)

t1 T1



 ˆ

i

1

(i)

(41)

Expectation Maximization

• The forward-backward algorithm is an

instance of the more general EM algorithm

– The E Step: Compute the forward and backward probabilities for a give model

– The M Step: Re-estimate the model parameters

References

Related documents

Failing to address climate change impacts can undermine progress towards most SDGs (Le Blanc 2015). Many activities not only declare mitigation targets but also cite the importance

(Environmental variables should represent measurements of natural resources and reflect potential influences to its viability. It could incorporate air and water quality,

1. The white-collar crimes are committed by people who are financially secure and perform such illegal acts for satisfying their wants. These crimes are generally moved

6851 Loans for Village and Small Industries 6860 Loans for Consumer Industries 6875 Loans for Other Industries 7053 Loans for Civil Aviation 7055 Loans for Road Transport.. 7610

1. The white-collar crimes are committed by people who are financially secure and perform such illegal acts for satisfying their wants. These crimes are generally moved

Based on the call for a more nuanced understanding of illegal wildlife trade and why individuals engage in these activities, this study interviewed 73 convicted wildlife

pendence of magnesium perchlorate and ammoni- the limitation of m-DNB cathode for high rate ap- \ urn chloride-zinc chloride with respect to the plications3l. 2 M

The scan line algorithm which is based on the platform of calculating the coordinate of the line in the image and then finding the non background pixels in those lines and