• No results found

Coreference Resolution

N/A
N/A
Protected

Academic year: 2022

Share "Coreference Resolution"

Copied!
28
0
0

Loading.... (view fulltext now)

Full text

(1)

Coreference Resolution

Seminar by

Satpreet Arora (07D05003)

(2)

What is Coreference?

In linguistics, co-reference occurs when multiple expressions in a sentence or

document refer to the same entity.

Example:

Aditya went to Videorec to buy a DVD for himself. He had frequented the store for many years now.

Here, Aditya, himself and He are coreferent.

Also Videorec and store are coreferent.

(3)

Coreference Resolution.

Determine which entities in a

document/discourse have the same referent.

In NLP, we usually deal with Coreference resolution of NPs.

The coreference system has to form

equivalence classes of NPs that have the same real-world entity as its referent.

Coreference Relation is both transitive

and symmetric.

(4)

A Coreference System

Input of a coreference system:

John Simon, Chief Financial Ocer of Prime Corp. since 1986, saw his pay jump 20%, to $1.3 million, as the 37-year-old also became the financial-service company’s president.

Output of a coreference system:

[JS John Simon], [JS Chief Financial Ocer] of [PC Prime

Corp.] since 1986, saw [JS his] pay jump 20%, to $1.3 million, as [JS the 37-year-old] also became the [PC financial-service company]’s [JS president]

Equivalence classes:

JS: {John Simon, Chief Financial Ocer, his, the 37-year-old, president}

PC: {Prime Corp., financial-service company}

(5)

Anaphora Resolution

Anaphora refers to the linguistic phenomenon of having a noun phrase refer to a previously

mentioned entity in a text for its semantic interpretation

In other words, a pair of NPs <npi, npj>

constitutes an anaphoric relationship if i < j and npj depends on npi for its interpretation, where npk denotes the kth NP in a document. For

instance, the NP pair <Queen Elizabeth, her> forms an anaphoric relationshipin our example.

Queen Elizabeth set about transforming her husband

(6)

Coreference Resolution - an Important Problem

Applications include:

Question answering systems and Information Retrieval:

Consider the query- Where was Mozart born?. A question answering system may first retrieve the sentence He was born in Salzberg from a document talking about Mozart. In this case, the system will return the correct answer if it can determine that the pronoun ‘He’ is coreferent with Mozart.

Machine Translation.

Anaphora resolution comes into play when discrepancies exist between the two languages with respect to anaphor selection. For example, a pronoun in the Malay language is often translated directly by its antecedent (Mitkov, 1999).

(7)

Coreference Resolution - an Important Problem (cont’d)

Text Summarization

Text summarisation tools using coreference

resolution not only include in the summary those sentences that contain a term appearing in the query, they also incorporate sentences containing a noun phrase that is coreferent with a term

occurring in a sentence already selected by the system.

Cross-document coreference is particularly useful for text summarization systems that need to

identify and merge the same piece of information about an entity mentioned in different documents in order to avoid repetition.

(8)

Coreference Resolution - a Hard Problem The difficulty of the problem lies in its

dependence on sophisticated semantic and world knowledge.

The policemen refused the women a permit for the demonstration because they feared violence.

The policemen refused the women a permit for the demonstration because they advocated violence.

Observe how they is referring two two

different entities in the two sentences

depending upon the context. Its easy for

humans but difficult for machines.

(9)

Coreference Resolution - a Hard Problem (cont’d)

Many sources of information play a role:

Lexical information such as head noun

matches (as in Lord Spencer and Mr. Spencer ) is an indicator of coreference, although it is

not an absolute indicator (e.g. Lord Spencer and Diana Spencer are not coreferent).

Knowledge sources such as gender and

number, semantic class, discourse focus, and world knowledge also play a role in

determining whether two discourse entities

are coreferent.

(10)

Coreference Resolution - a Hard Problem (cont’d)

No single source of knowledge is completely reliable indicator:

For example, two semantically compatible NPs are potentially coreferent (e.g. Diana

Spencer and the princess) but whether the NPs are actually coreferent depends on other

factors (such as contextual information).

Linguistic constraints indicating (non-

)coreference such as number (dis)agreement is not absolutely hard (e.g. the singular NP assassination (of her bodyguards) can be

coreferent with the plural NP these murders .

(11)

Coreference Resolution - a Hard Problem (cont’d)

Coreference strategies differ depending on the type of NP:

Definite NPs are more likely to be anaphoric than their non-definite counterparts (e.g. the article immediately

preceding man in the sentence “Diana saw the/a photographer following her secretly” determines whether the NP has an existential or definite reading).

Pronoun resolution is difficult because resolution

strategies differ for each type of pronouns (e.g. reflexives versus possessives) and also because some pronouns such as pleonastic pronouns are semantically empty (e.g. the pronoun it in the sentence “Camilla went outside and it was raining ” is pleonastic).

(12)

The Algorithm

A lot of different algorithms use different

approaches to solve the problem. However, they all share some basic components.

(for non machine learning based algorithms)

Step 1: Identification of Discourse Entities (NPs) Step 2: Representation of NPs (as a set of

features);

Step3: Calculating distances between NPs using the distance metric

Step 4:Creating equivalence classes using a

clustering algorithm or other classification tools.

(13)

IDENTIFICATION OF DISCOURSE ENTITIES

For coreference resolution algorithms, the task is to identify all of the noun phrases in the text.

Textual elements which can be definite noun phrases,

demonstrative noun phrases, proper names, appositives, sub- noun phrases that act as modifiers, pronouns and so on.

The basic structure of the identification is as follows:

(14)

Identification of Discourse Entities (cont’d)

Tokenization and Morphological Processing: Splitting the text to sentences and stemming words to their root form

POS tagging: Hidden Markov Model based statistical POS tagger.

Noun Phrase Identification: Determines noun phrase boundaries based on POS tagger.

Named Entity recognition: May also be HMM based, learns from a tagged corpus of named entities. If

there are overlaps – boundaries are adjusted.

Nested Noun Phrases Extraction : Accepts noun phrases and determines the nested phrases

(if any)

(15)

Representation of NPs

Representation of NPs : A set of features used to construct the feature vector.

individual words in the NP;

head noun: last word of the NP;

position in the document;

pronoun type: nominative, accusative, possessive, ambiguous;

article: indefinite, definite, none;

appositive: based on heuristics (commas, etc.);

number: plural, singular;

proper name: based on heuristics (capitalization, etc.);

semantic class based on Wordnet;

gender: masculine, feminine, either, neuter;

animacy: based on semantic class.

(16)

Representation of NPs (cont’d)

(17)

Calculating distances between NPs

Different algorithms use different distance

metrics. We here present one from Cardie et al.

(1999). Noun phrase coreference as clustering’

and the corresponding clustering algorithm.

The distance between noun phrases NP1 and NP2 is defined as:

dist(NP1,NP2) =∑ wf* incompatibilityf(NP1,NP2)

The summation is over f ∈ F

F: set of features

wf: weight of feature f

incompatibilityf: degree of incompatibility between NP1 and NP2

(18)

Calculating distances between NPs

(cont’d)

(19)

Clustering

Properties of the clustering algorithm:

start from the end of the document and work backwards;

if distance between two NPs is less than r , then their equivalence classes are considered for merging;

classes can be merged unless they contain incompatible NPs;

the algorithm automatically computes the transitive closure of the coreference relation;

two NPs can be coreferent even if dist(NP1,NP2) > r, as long as dist(NP1,NP2) ≠∞;

r is a free parameters of the algorithm.

(20)

Clustering

(21)

Machine Learning algorithms

In the algorithm we just saw, the weights of each feature are fixed manually.

Unlike manual approaches, machine learning approaches to coreference resolution induce a model that determines the probability that two NPs are coreferent from annotated data

automatically via the use of learning algorithms.

They can be characterized in terms of the

knowledge sources being employed, the method of training data creation, as well as the learning algorithm and the clustering algorithm being chosen.

(22)

Machine Learning algorithms

Training Data Creation:

Positive instances:Two different methods are used to create positive training instances: from Aone et al. (1995)

1)Transitive (i.e. an instance is formed between an NP and each of its preceding NPs in the same anaphoric chain) and

2)Non-transitive (i.e. an instance is formed between an NP and its closest preceding NP in the same anaphoric chain

Negative instances:

1) Negative instances are generated by pairing an NP with each preceding NP that does not have an anaphoric relationship with it.- Aone et al. (1995)

2) To reduce the ratio of negative instances to positive instances a negative instance is created by pairing an

anaphoric NP, npj, with each NP appearing between npj and its closest preceding antecedent-Soon et al. (2001)

(23)

Learning Algorithm

A lot of recent research involving machine learning techniques use decision trees for classifying NP pairs. Soon et al. used C5 tree classifier.

(24)

Learning Algorithm (cont’d)

Each pair of NPs is presented to the classifier and the classifier returns a

probability or certainty value of the pair being coreferent.

All pairs which receive a probability value greater than a threshold value are

considered as being coreferent.

The algorithm then constructs the transitive closure of all the pairs and thus achieves

partitioning of all the NPs into coreferent

classes.

(25)

Clustering algorithms are also used in machine learning approaches.

The relative importance of all the factors discussed previously is learnt from the training corpus instead of being fixed

manually. This allows for consideration of a larger number of factors.

In principle, learning-based systems are more robust than knowledge-based systems in the face of noisy input (meaning there are

exceptions to rules). Also machine learning

algorithms adapt easier to different topics.

(26)

Conclusion

Machine learning approaches to coreference resolution have been shown to be a promising way to build robust

coreference resolution systems.

Despite the successful application of machine learning

techniques to coreference resolution, the problem is far from being solved.

Linguistics combined with machine learning techniques can prove effective in solving the coreference problem.

Coreference resolution is one of the most difficult problems in language understanding. Given that NLP is often said to be

“AI-complete” we might be able to conclude that

coreference resolution is among the hardest of the hardest problems in artificial intelligence.

(27)

References

Vincent N G (2002):Machine Learning for Coreference Resolution:Recent Successes and Future Challenges

Cardie, Claire and Kiri Wagsta. 1999. Noun phrase coreference as clustering.

Byron, D. and J. Tetreault, 1999. A Flexible Architecture for Reference Resolution. In Proc. of the 9th EACL.

Unsupervised Models for Co-reference Resolution (Vincent Ng 2008).

Wee Meng Soon, Daniel Chung Yong Lim, Hwee Tou Ng(2001) DSO National Laboratories :A Machine Learning Approach to

Coreference Resolution of Noun Phrases

Wikipedia

(28)

Questions

References

Related documents

INDEPENDENT MONITORING BOARD | RECOMMENDED ACTION.. Rationale: Repeatedly, in field surveys, from front-line polio workers, and in meeting after meeting, it has become clear that

Section 2 (a) defines, Community Forest Resource means customary common forest land within the traditional or customary boundaries of the village or seasonal use of landscape in

In the method proposed here, a predefined fixed number of classes of images each of which correspond to a keyword/concept are modeled by the two-dimensional multi-resolution

Ensemble method or any combination model train multiple learners to solve the classification or regression problems, not by simply ordinary learning approaches that can able

Naive Bayes using kernel density estimation, radial basis function networks and support vector machines showed slight decrease in their accuracies after reaching certain

21 The link prediction algorithms based on user input as maximum path to be traversed predicts the probability of formation of link between any two nodes of the network by

This paper proposes a deep core vector machine with multi- ple layers of feature extraction. Each feature extraction layer is modeled with an unsupervised multiple kernel

candidate sets of relationships using indexing etc., which are then compared via machine- learning. Such approaches rely on the textual labels of the relationships, rather than