Rule Probabilities

(1)

CS344: Introduction to Artificial Intelligence

Pushpak Bhattacharyya CSE Dept.,

IIT B b IIT Bombay

Lecture 30: Probabilistic Parsing:

Al ith i Algorithmics

(Lecture 28-29: two hours on student seminars on Default Reasoning, Child Language Acquisition and Short Term and

Long Term Memory)

(2)

Formal Definition of PCFG

A PCFG consists of

A set of terminals {w_k}, k = 1,….,V

{w_k} = { child, teddy, bear, played…}

A set of non-terminals {Nⁱ}, i = 1,…,n

{N_i} = { NP, VP, DT…}

A designated start symbol N¹

A set of rules {Nⁱ→ ζζ^j}, where ζζ^j is a sequence q of terminals & non-terminals

NP → DT NN

A di f l b bili i

A corresponding set of rule probabilities

(3)

Rule Probabilities

Rule probabilities are such that

P(Nⁱ ^j ) 1

i ζ

∀

∑

→ =

E.g., P( NP → DT NN) = 0.2

i

P(N ) 1

i ζ

∀

∑

→ =

P( NP → NN) = 0.5

P( NP → NP PP) = 0.3

P( NP →( DT NN) = 0.2 )

Means 20 % of the training data parses use the rule NP → DT NN

(4)

Probabilistic Context Free G

Grammars

S NP VP 1 0 DT h 1 0

S → NP VP 1.0

NP → DT NN 0.5

NP → NNS 0 3

DT → the 1.0

NN → gunman 0.5

NN → building 0 5

NP → NNS 0.3

NP → NP PP 0.2

PP → P NP 1.0

NN → building 0.5

VBD → sprayed 1.0

NNS → bullets 1.0

PP → P NP 1.0

VP → VP PP 0.6

VP → VBD NP 0.4

NNS → bullets 1.0

(5)

Example Parse t

_1`

The gunman sprayed the building with bullets.

S_1.0

NP_0.5 VP_0.6

P (t₁) = 1.0 *

0.5 * 1.0 * 0.5 * 0.6 * 0.4 * 1.0

* 0.5 * 1.0 * 0.5 * 1.0 * 1.0 *

0.5 0.6

DT_1.0 NN_0.5 PP_1.0

0.3 * 1.0 =

0.00225 VP_0.4

VBD_1.0 NP_0.5 P_1.0 NP_0.3 The gunman

DT_1.0 NN_0.5with NNS_1.0

building the

sprayed

bullets building

the

(6)

Another Parse t

₂

S

S_1.0

NP_0.5 VP_0.4

P (t₂)

= 1.0 * 0.5 * 1.0 * 0.5 * 0.4 * 1.0 * 0.2 * 0.5 * 1.0 * DT_1.0 NN_0.5VBD_1.0 NP_0.2

0.4 1.0 0.2 0.5 1.0 0.5 * 1.0 * 1.0 * 0.3 * 1.0

= 0.0015

NP_0.5 PP_1.0 The gunman sprayed

DT_1.0 NN_0.5 P_1.0 NP_0.3 NNS ₀ building ith

th NNS_1.0

bullet s

buildingwith th

e

(7)

Probability of a sentence

Notation :

b

N_j NP

w_ab – subsequence w_a….w_b

N_j dominates w_a….w_b

or yield(N_j) = w_a….w_b ^w^a^………..w^b the..sweet..teddy ..be

• Probability of a sentence = P(w_1m)

1 1

( _m) ( _m, )

t

P w =

∑

P w t Where t is a parse tree of the

sentence

y ( _1m)

= ( ) ( 1 | ) ( )

m t

P t P w t

= P t

∑

sentence

( | ) = 1

P w t

Q If t is a parse tree fo the sentence

: ( ) 1

( )

t yield t w m

P t

=

∑

^Q ^{P w}⁽ ¹^m ^{| ) = 1}^t for the sentence w_1m, this will be 1

!!

(8)

Assumptions of the PCFG model Assumptions of the PCFG model

Place invariance :

P(NP → DT NN) is same in locations 1 and 2 P(NP → DT NN) is same in locations 1 and 2

Context-free :

P(NP DT NN | thi t id “Th hild”) P(NP → DT NN | anything outside “The child”)

= P(NP → DT NN)

Ancestor free : At 2 ^S

Ancestor free : At 2,

P(NP → DT NN|its ancestor is VP)

= P(NP →DT NN)

S

NP VP

= P(NP →DT NN) 1

The child NP

1

2

The toy

2

(9)

Probability of a parse tree

Domination

:We say N_j_j dominates from k to l, symbolized as , if W_k,l is derived from N_j

P (tree |sentence) = P (tree | S( | ) ( | _{1 l}_1,l) )

where S_1,lmeans that the start symbol S dominates the word sequence W_1,l

P (t |s) approximately equals joint probability of constituent non-terminals dominating the sentence fragments (next slide)

(10)

Probability of a parse tree (cont.) Probability of a parse tree (cont.)

S_1,l

NP_1,2 VP_3,l

N₂ V_3,3 PP_4,l

P NP

w DT₁

w P ( t|s ) = P (t | S_1,l)

= P ( NP_1,2, DT_1,1 , w_1,

P_4,4 NP_5,l w₂

w₄

w₁ w₃

w₅ w_l

1,2 1,1 1,

N_2,2, w_2,

VP_3,l, V_3,3, w_3,

PP P w NP w | ^S ⁾

PP_4,l, P_4,4, w_4, NP_5,l, w_5…l| ^S_1,l ⁾

(Using Chain Rule, Context Freeness and Ancestor Freeness )

(11)

HMM ↔ PCFG

O observed sequence ↔ w_1m sentence

sentence

X state sequence ↔ t parse tree

μ model ↔ G grammar

Three fundamental questions

(12)

HMM ↔ PCFG

How likely is a certain observation given the model? ↔ How likely is a sentence given

the grammar?

( | )

P w G ( | )

P O₍ _{| )} P w( ₁_m | G) P O μ ^↔

How to choose a state sequence which best explains the observations? ↔ How to

explains the observations? ↔ How to

choose a parse which best supports the sentence?^{arg max (}^X ^{P X O}^| ^{, )}^μ arg max ( | ¹_m, )

t

P t w G

sentence? ^X ↔ ^t

(13)

HMM ↔ PCFG

How to choose the model parameters that best explain the observed data? ↔ How to choose rule probabilities which maximize the probabilities of the observed sentences?

arg max (P O | )

μ μ _{arg max (} ₁_m _| ₎

G

P w G

↔

(14)

Interesting Probabilities Interesting Probabilities

N¹ What is the probability of having a NP at this position such that it will derive

NP

(4, 5) βNP

at this position such that it will derive

“the building” ? -

Inside Probabilities

NP Inside Probabilities

The gunman sprayed the building with bullets1 2 3 4 5 6 7

O t id P b biliti

(4 5)

What is the probability of starting from N¹ and deriving “The gunman sprayed”, a NP

Outside Probabilities

(4, 5)

αNP

and “with bullets” ? -

(15)

Interesting Probabilities

Random variables to be considered

The non-terminal being expanded. g p E.g., NP

The word-span covered by the non-terminal.

E g (4 5) refers to words “the building”

E.g., (4,5) refers to words the building

While calculating probabilities, consider:g p ,

The rule to be used for expansion : E.g., NP → DT NN

The probabilities associated with the RHS non

The probabilities associated with the RHS non- terminals : E.g., DT subtree’s inside/outside probabilities & NN subtree’s inside/outside probabilities

probabilities

(16)

Outside Probability Outside Probability

α_j_j(p,q) : The probability of beginning with N¹

& generating the non-terminal N^j_pq and all words outside w_p_p..w_q_q

1( 1) ( 1)

( , ) ( , ^j , | )

j p q P w p Npq w q m G

α = ₋ ₊

N1¹

N^j

w w w w w w

w₁………w_p-1w_p…w_qw_q+1……… w_m

(17)

Inside Probabilities

β_j(p,q) : The probability of generating the words w w starting with the non terminal words w_p..w_q starting with the non-terminal

N^j_pq. _{( , )} ₍ _| ^j _{, )}

j p q P wpq Npq G

β =

αN¹ N^j

w₁………w_p-1w_p…wβ _qw_q+1……… w_m

(18)

Outside & Inside

Probabilities:example Probabilities:example

(4, 5) for "the building"

(Th d ith b ll t | )

NP

P NP G

α

(The gunman sprayed, 4,5, with bullets | )

P NP G

=

(4, 5) for "the building" (the building | 4,5, )

NP P NP G

β =

N¹

NP

The gunman sprayed the building with bullets

1 2 3 4 5 6 7

(19)

Inside probabilities β

_j

(p,q)

Base case:

(k k) P w( | N ^j G) P N( ^j w | G) β _j ( , )k k P w( _k | N_kk^j , )G P N( _kk^j → w_k | G)

β = = →

Base case is used for rules which derive the

Base case is used for rules which derive the words or terminals directly

E g

Suppose N^j NN is being considered &

E.g.,

Suppose N^j = NN is being considered &

NN → building is one of the rules with probability 0 5

probability 0.5

(5, 5) ( | 5,5, )

( | ) 0 5

NN P building NN G P NN b ildi G

β =

5,5 →

= P NN( → building G| ) = 0.5

(20)

Induction Stepp

Induction step : N^j

1

( , ) ( | ^j , )

j pq pq

q

p q P w N G β

−

=

N^r N^s ^,

( ) *

q

j r s

r s d p

P N N N β p d

=

∑∑

→

w_p w_dw_d+1 w_q

( , ) * ( 1, )

r s

p d d q β

β +

Consider different splits of the words - indicated by d E.g., the huge building

Consider different non-terminals to be used in the rule:

NP → DT NN NP → DT NNS are available options

Split here for d=2 d=3

NP → DT NN, NP → DT NNS are available options Consider summation over all these.

(21)

The Bottom-Up Approach

The idea of induction

Consider “the gunman” ^NP^{0 5}

Consider the gunman

Base cases : Apply unary rules

NP_0.5

DT_1.0 NN_0.5

Base cases : Apply unary rules DT → the Prob = 1.0

NN → gunman Prob = 0.5 ^{The gunman} NN → gunman Prob 0.5

Induction : Prob that a NP covers these 2 words

= P (NP → DT NN) * P (DT deriving the word

“the”) * P (NN deriving the word “gunman”)

= 0.5 * 1.0 * 0.5 = 0.25

(22)

Parse Triangle

A parse triangle is constr cted for calc lating

• A parse triangle is constructed for calculating β_j(p,q)

• Probability of a sentence using β (p q):

1 1

1 1 1 1 1

( _m | ) ( _m | ) ( _m | _m, ) (1, )

P w G = P N → w G = P w N G = β m

• Probability of a sentence using β_j(p,q):

(23)

Parse Triangle Parse Triangle

The (1)

gunman (2)

sprayed (3)

the (4)

building (5)

with (6)

bullets (7)

( ) ( ) ( ) ( ) ( ) ( ) ( )

1 2

DT 1.0 β =

β = 0 5

3 4

NN 0.5 β =

VBD 1.0 β =

DT 1.0 β =

5 6 7

NN 0.5 β =

β 1 0

DT

P 1.0 β =

7

• Fill diagonals with β _j ( , )k k

NNS 1.0 β =

(24)

Parse Triangle Parse Triangle

The (1)

gunman (2)

sprayed (3)

the (4)

building (5)

with (6)

bullets (7)

( ) ( ) ( ) ( ) ( ) ( ) ( )

1 2 3

DT 1.0 β =

NN 0.5 β =

β =1 0

NP 0.25 β =

3 4

5 β_NN = 0.5

VBD 1.0 β =

DT 1.0 β =

6 7

C l l t i i d ti f l

NN

NNS 1.0 β =

P 1.0 β =

• Calculate using induction formula

(1, 2) (the gunman | 1,2, )

( ) * (1 1) * (2 2)

NP P NP G

P NP DT NN β

β β

=

= →

( ) (1,1) (2, 2)

0.5*1.0 * 0.5 0.25

DT NN

P NP DT NN β β

= →

= =

(25)

Example Parse t

₁

S

S_1.0

NP_0.5 VP_0.6

Rule used here is

VP → VP PP DT_1.0 NN_0.5 VP_0.4 PP_1.0

VP → VP PP

VBD_1.0 NP_0.5 DT NN

P_1.0 NP_0.3 h

The gunman

DT_1.0 NN_0.5 NNS_1.0 bullet with

building th

sprayed

bullet s

g e

(26)

Another Parse t

₂

S

S_1.0

NP_0.5 VP_0.4

Rule used here is

VP → VBD NP DT_1.0 NN_0.5VBD_1.0 NP_0.2

VP → VBD NP

NP_0.5 PP_1.0 DT NN P NP The gunmansprayed

DT_1.0 NN_0.5 P_1.0 NP_0.3 NNS_{1 0} buildingwith

th NNS_1.0

bullet s

g e

(27)

Parse Triangle Parse Triangle

The (1) gunman (2)

sprayed (3)

the (4)

building (5)

with (6)

bullets (7)

β 1 0 β 0 25 β 0 0465

1 2 3

DT 1.0 β =

NN 0.5 β =

VBD 1.0 β =

NP 0.25 β =

VP 1.0

β = β_VP = 0.186

0.0465 βS =

4 5 6

NN 0.5 β =

DT 1.0 β =

β 1 0

NP 0.25

β = β_NP = 0.015

6

7 β_NNS =1.0

P 1.0 β =

(3, 7) (sprayed the building with bullets | 3 7, )

VP P VP G

β =

PP 0.3 β =

(3, 7) (sprayed the building with bullets | 3,7, )

( ) * (3, 5) * (6, 7)

( ) * (3, 3) * (4, 7)

VP

VP PP

VBD NP

P VP G

P VP VP PP P VP VBD NP β

β β

= →

+ ( → ) ( , ) ( , )

0.6 *1.0 * 0.3 0.4 *1.0 * 0.015 0.186

VBD NP

β β

= + =

(28)

Different Parses

Consider

Different splitting points :

Different splitting points : E.g., 5th and 3^rd position

Using different rules for VP expansion :

Using different rules for VP expansion : E.g., VP → VP PP, VP → VBD NP

Different parses for the VP “sprayed the

Different parses for the VP “sprayed the building with bullets” can be

constructed this way constructed this way.

(29)

Outside Probabilities α

_j

(p,q)

Base case:

1(1, ) 1 for start symbol

(1 ) 0 for 1

m

m j

α α

=

= ≠

Inductive step for calculating :

(1, ) 0 for 1

j m j

α = ≠

(p q) Inductive step for calculating α :

( , )

f p e

N¹ α

f j

( , )

j p q

α

N^f_pe

(q 1, )e β +

( ^f ^j ^g )

P N → N N

N^j_pq N^g_(q+1)

e

( 1, )

g q e

β +

Summation w_p w_qw_q+1 w_e

w_p-1

w₁ w_e+1 w_m ^Summationover f, g &

e

(30)

Probability of a Sentence

• Joint probability of a sentence w and that there is a

1 1

( _m, _pq | ) ( _m | _pq^j , ) _j ( , ) _j ( , )

j j

P w N G =

∑

P w N G =

∑

α p q β p q

• Joint probability of a sentence w_1m and that there is a constituent spanning words w_p to w_q is given as:

(The gunman....bullets, 4 5 | )

P N G

N¹

4,5

(The gunman....bullets, | )

(The gunman...bullets | ^j , )

j

P N G

=

∑

NP

(4, 5) (4, 5)

(4, 5) (4, 5) ...

NP NP

VP VP

α β

=

+ +

Th d th b ildi ith

The gunman sprayed the building with bullets1 2 3 4 5 6 7