• No results found

Pushpak Bhattacharyya Pushpak Bhattacharyya

N/A
N/A
Protected

Academic year: 2022

Share "Pushpak Bhattacharyya Pushpak Bhattacharyya"

Copied!
21
0
0

Loading.... (view fulltext now)

Full text

(1)

CS460/626 : Natural Language CS460/626 : Natural Language

Processing/Speech, NLP and the Web

(Lecture 5 WSD approaches) (Lecture 5– WSD approaches)

Pushpak Bhattacharyya Pushpak Bhattacharyya

CSE Dept.,

IIT Bombay

13

th

J 2011

13

th

Jan, 2011

(2)

Motivation

WSD: At the Heart of NLP

MT

NER

TE

SRL : Semantic Role Labeling

TE : Text Entailment

CFILT -I

WSD

NER

TE

TE : Text Entailment

CLIR : Cross Lingual Information Retrieval NER : Named Entity Recognition

MT : Machine Translation

IITB

SA

CLIR

SP : Shallow Parsing

SA : Sentiment Analysis

WSD : Word Sense Disambiguation

SP SRL

2

(3)

L EARNING B ASED v/s H YBRID A PPROACHES

A PPROACHES

„

Knowledge Based Approaches

Rely on knowledge resources like WordNet

„

Rely on knowledge resources like WordNet, Thesaurus etc.

„

May use grammar rules for disambiguation.

M h d d d l f di bi ti

CFILT -

„

May use hand coded rules for disambiguation.

„

Machine Learning Based Approaches

„

Rely on corpus evidence.

IITB

„

Rely on corpus evidence.

„

Train a model using tagged or untagged corpus.

„

Probabilistic/Statistical models.

H b id A h

„

Hybrid Approaches

„

Use corpus evidence as well as semantic relations form WordNet.

3

(4)

Bird’s eye view

ApproachesWSD

Machine Knowledge

CFILT -

Learning Based IITB

Supervised Unsupervised Semi-

supervised Hybrid

4

(5)

KNOWLEDGE BASED APPROACHES

5

(6)

WSD USING SELECTIONAL

PREFERENCES AND ARGUMENTS PREFERENCES AND ARGUMENTS

Sense 1 Sense 2

„

This airlines serves dinner in the evening flight.

„

serve (Verb) t

„

This airlines serves the sector between Agra & Delhi.

„

serve (Verb) t

CFILT -

„

agent

„

object – edible

„

agent

„

object – sector

IITB

Requires exhaustive enumeration of:

¾Argument-structure of verbs.

¾Selectional preferences of arguments.p g

¾Description of properties of words such that meeting the selectional preference criteria can be decided.

E.g. This flight serves the “region” between Mumbai and Delhi How do you decide if “region” is compatible with “sector”

6 How do you decide if region is compatible with sector 6

(7)

SELECTIONAL PREFERENCES SELECTIONAL PREFERENCES

(INDIAN TRADITION)

„

“Desire” of some words in the sentence (“aakaangksha”).

„ I saw the boy with long hair.

„ The verb “saw” and the noun “boy” desire an object here.

„

“Appropriateness” of some other words in the sentence to fulfil that desire (“yogyataa”).

I saw the boy with long hair

„ I saw the boy with long hair.

„ The PP “with long hair” can be appropriately connected only to “boy” and not

“saw”.

In case the ambiguity is still present “proximity” (“sannidhi”)

„

In case, the ambiguity is still present, “proximity” (“sannidhi”) can determine the meaning.

„ E.g. I saw the boy with a telescope.

„ The PP “with a telescope” can be attached to both “boy” and “saw”, so p y , ambiguity still present. It is then attached to “boy” using the proximity check.

7 7

(8)

SELECTIONAL PREFERENCES SELECTIONAL PREFERENCES (RECENT LINGUISTIC THEORY)

„ There are words which demand arguments, like, verbs,

prepositions, adjectives and sometimes nouns. These arguments are typically nouns.

„ Arguments must have the property to fulfil the demand They must

„ Arguments must have the property to fulfil the demand. They must satisfy selectional preferences.

„ Example

„ Give (verb)

agent animate

„ agent – animate

„ obj – direct

„ obj – indirect

„ I gave him the book

„ I gave him the book (yesterday in the school) -> adjunct

„ How does this help in WSD?

„ One type of contextual information is the information about the type of arguments that a word takes.

8 8

(9)

Verb Argument frame

„

Structure expressing the desire of a word is called the Argument Frame

word is called the Argument Frame

„

Selectional Preference

Properties of the “Supply Words” meeting

„

Properties of the Supply Words meeting

the desire of the previous set

(10)

Argument frame (example)

Sentence: I am fond of X Fond

{{ Arg1: Prepositional Phrase (PP) { PP: of NP PP: of NP

{ N: somebody/something } }}

}

(11)

Verb Argument frame (example)

Verb: give Give {

agent: <the give> animate

di t bj t th thi i

direct object: <the thing given>

indirect object:

<beneficiary> animate/organization

<beneficiary> animate/organization

} [I]

t

gave a [book]

d bj

to [Ram]

i bj

.

[I]

agent

gave a [book]

dobj

to [Ram]

iobj

.

(12)

Resources for Verbs

„

VerbNet

(

http://verbs.colorado.edu/~mpalmer/projects/verbnet.html)

(

http://verbs.colorado.edu/ mpalmer/projects/verbnet.html)

„

Propbank (

http://en.wikipedia.org/wiki/PropBank

) VerbOcean

„

VerbOcean

(

http://demo.patrickpantel.com/demos/verbocean/

)

(13)

CRITIQUE

„

Requires exhaustive enumeration in machine-readable form of:

„ Argument-structure of verbs.

„ Selectional preferences of arguments

„ Selectional preferences of arguments.

„ Description of properties of words such that meeting the selectional preference criteria can be decided.

„ E.g. This flight serves the “region” between Mumbai and Delhi

„ How do you decide if “region” is compatible with “sector”

„

Accuracy

„ 44% on Brown corpus.

13 13

(14)

OVERLAP BASED APPROACHES

„ Require a

Machine Readable Dictionary (MRD).

„ Find the overlap between the features of different senses of an

CFILT -

ambiguous word (sense bag) and the features of the words in itsp context (context bag).

Th f t ld b d fi iti l t

IITB

„ These features could be sense definitions, example sentences, hypernyms etc.

„ The features could also be given weights

„ The features could also be given weights.

„ The sense which has the maximum overlap is selected as the contextually appropriate sense.y pp p

14 14

(15)

L S ’S A GO

L ESK’S A LGORITHM

Sense Bag: contains the words in the definition of a candidate sense of the ambiguous word.g

Context Bag: contains the words in the definition of each sense of each context word.

E.g. “On burning coal we get ash.”

From Wordnet

„ The noun ash has 3 senses (first 2 from tagged texts)

„ 1. (2) ash -- (the residue that remains when something is burned)( ) ( g )

„ 2. (1) ash, ash tree -- (any of various deciduous pinnate-leaved ornamental or timber trees of the genus Fraxinus)

„ 3. ash -- (strong elastic wood of any of various ash trees; used for furniture and tool handles and sporting goods such as baseball bats)

„ The verb ash has 1 sense (no senses from tagged texts)

h ( h )

„ 1. ash -- (convert into ashes) 15

(16)

CRITIQUE

„

Proper nouns in the context of an ambiguous word can act as strong disambiguators.

E g “Sachin Tendulkar” will be a strong indicator of the E.g. “Sachin Tendulkar” will be a strong indicator of the category “sports”.

Sachin Tendulkar plays cricket.

„

Proper nouns are not present in the thesaurus Hence this

„

Proper nouns are not present in the thesaurus. Hence this approach fails to capture the strong clues provided by proper nouns.

„

Accuracy

„

Accuracy

„ 50% when tested on 10 highly polysemous English words.

16

(17)

Extended Lesk’s algorithm

„ Original algorithm is sensitive towards exact words in the d fi iti

definition.

„ Extension includes glosses of semantically related senses from WordNet (e.g.

hypernyms

,

hyponyms

, etc.).

„ The scoring function becomes:

| ) ( )

(

| )

( S context w gloss s

score

ext

= ∑ ′

I

„

where,

„ gloss(S) is the gloss of sense S from the lexical resource.

)

(s or s s rel

s

„ Context(W) is the gloss of each sense of each context word.

„ rel(s) gives the senses related to s in WordNet under some relations.

(18)

WordNet Sub-Graph

Hyponymy

WordNet Sub Graph

Dwelling,abode

kitchen Meronymy

Hypernymy

Hyponymy

bedroom bckyard

M e

Gloss Hyponymy

house,home

A place that serves as the living quarters of one or mor efamilies veranda

r o n y m

y study

y

guestroom hermitage cottage

(19)

Example: Extended Lesk Example: Extended Lesk

„

“On combustion of coal we get ash”

From Wordnet

The noun ash has 3 senses (first 2 from tagged texts)

„ The noun ash has 3 senses (first 2 from tagged texts)

„ 1. (2) ash -- (the residue that remains when something is burned)

„ 2. (1) ash, ash tree -- (any of various deciduous pinnate-leaved ornamental or timber trees of the genus Fraxinus)

ornamental or timber trees of the genus Fraxinus)

„ 3. ash -- (strong elastic wood of any of various ash trees; used for furniture and tool handles and sporting goods such as baseball bats))

„ The verb ash has 1 sense (no senses from tagged texts)

„ 1. ash -- (convert into ashes)

(20)

Example: Extended Lesk

(cntd)

Example: Extended Lesk

(cntd)

„

“On combustion of coal we get ash”

From Wordnet (through hyponymy)

„ ash -- (the residue that remains when something is burned)

> fl ash (fine solid pa ticles of ash that a e ca ied into the

=> fly ash -- (fine solid particles of ash that are carried into the air when fuel is combusted)

=> bone ash -- (ash left when bones burn; high in calcium phosphate; used as fertilizer and in bone china)

phosphate; used as fertilizer and in bone china)

(21)

Critique of Extended Lesk

„

Larger region of matching in WordNet

„

Increased chance of Matching

„

Increased chance of Matching

BUT

„

Increased chance of Topic Drift

„

Increased chance of Topic Drift

References

Related documents

Another tourist example: this time in a restaurant setting in a different country restaurant setting in a different country (Manna, 1974). „ Facts: A tourist is in a restaurant in

If a system is Sound &amp; Complete, it does not matter how you “Prove” or “show the validity”. Take the Syntactic Path or the

State Space : Graph of states (Express constraints and parameters of the problem)L. Operators : Transformations applied to

Going backward from final winner sequence which ends in state S 2 (indicated By the 2 nd tuple), we recover the sequence... The HMM,

Going backward from final winner sequence which ends in state S2 (indicated By the 2 nd tuple), we recover the sequence... The HMM,

„ One day, Sam left his small, yellow home to head towards the meat-packing plant where he worked, a task which was never completed, as on his way, he tripped, fell, and went

If monotone restriction (also called triangular inequality) is satisfied, then for nodes in the closed list, redirection of parent pointer is not necessary. In other words, if

„ E: advise; H: paraamarsh denaa (advice give): Noun Incorporation- very common Indian Language Phenomenon. Incorporation very common Indian