• No results found

(Lecture 10 11 MT approaches) (Lecture 10, 11–MT approaches)

N/A
N/A
Protected

Academic year: 2022

Share "(Lecture 10 11 MT approaches) (Lecture 10, 11–MT approaches)"

Copied!
46
0
0

Loading.... (view fulltext now)

Full text

(1)

CS460/626 : Natural Language CS460/626 : Natural Language

Processing/Speech, NLP and the Web

(Lecture 10 11 MT approaches) (Lecture 10, 11–MT approaches)

Pushpak Bhattacharyya Pushpak Bhattacharyya

CSE Dept., IIT Bombay

25

th

J d 27

th

J 2011 25

th

Jan and 27

th

Jan, 2011

Acknowledgement: parts are from Hansraj’s dual degree seminar presentation

(2)

Czeck-English data

„

[nesu] “I carry”

„

[ponese] “He will carry”

„

[ponese] He will carry

„

[nese] “He carries”

[ ] “ h ”

„

[nesou] “They carry”

„

[yedu] “I drive”

„

[plavou] “They swim”

(3)

To translate …

„

I will carry.

„

They drive

„

They drive.

„

He swims.

h ll d

„

They will drive.

(4)

Hindi-English data

„

[DhotA huM] “I carry”

„

[DhoegA] “He will carry”

„

[DhoegA] He will carry

„

[DhotA hAi] “He carries”

[ h h ] “ h ”

„

[Dhote hAi] “They carry”

„

[chalAtA huM] “I drive”

„

[tErte hEM] “They swim”

(5)

Bangla-English data

„

[bai] “I carry”

„

[baibe] “He will carry”

„

[baibe] He will carry

„

[bay] “He carries”

[b ] “ h ”

„

[bay] “They carry”

„

[chAlAi] “I drive”

„

[sAMtrAy] “They swim”

(6)

MT Approaches

interlingua interlingua

semantics semantics

syntax syntax

phrases phrases

words

phrases phrases

words words

words

SOURCE TARGET

(7)

Taxonomy

MT MT

Approaches

Knowledge Based;

Rule Based MT

Data driven;

Machine Learning Based

Example Based MT (EBMT)

Statistical MT

Interlingua Based Transfer Based MT (EBMT)

Interlingua Based Transfer Based

(8)

Motivation

z MT: NLP Complete

z NLP: AI complete

z AI: CS complete

z AI: CS complete

z How will the world be different when the language barrier  disappears?

V l f t t i d t b t l t d tl d

z Volume of text required to be translated currently exceeds  translators’ capacity (demand outstrips supply).

z Solution: automation (the onlysolution)

M hi t l ti t h i

z Many machine translation techniques

z Which approach is better for Hindi‐English MT

(9)

Interlingual representation:

complete disambiguation

„

Washington voted Washington to power

Vote

@ t <is-a >

@past action

goal Washingto

n power Washington

@emphasis

@emphasis

<is-a >

place

<is-a >

capability

<is-a > …

<is-a >

person

(10)

Kinds of disambiguation needed for a complete and correct

for a complete and correct interlingua graph

„

N: Name

„

P: POS

„

P: POS

„

A: Attachment

„

S: Sense

„

C: Co-reference

„

R: Semantic Role

(11)

Target Sentence Generation from interlingua

Target Sentence Generation

Lexical Transfer Morphological S t

Lexical Transfer Syntax

Planning Morphological

Synthesis (Word/Phrase

Translation ) (Word form Generation)

(Sequence) Generation)

(12)

Role of function word

„

Washington voted Washington to power.

power.

Washington ne Washington ko

„

Washington

agent

ne Washington

object

ko sattaa

goal

ke liye chunaa

Vote- chunna

Power- sattaa

(13)

Statistical Machine Translation (SMT)

„ Data driven approach

„ Goal is to find out the English sentence e given foreign  language sentence f whose p(e|f) is maximum.

„ Translations are generated on the basis of statistical model

„ Translations are generated on the basis of statistical model

„ Parameters are estimated using bilingual parallel corpora

(14)

SMT: Language Model

„ To detect good English sentences

„ Probability of an English sentence s1s…… s can be written  as

as

Pr(s1s…… sn) = Pr(s1) * Pr(s2|s1) *. . . * Pr(sn|s1 s. . . sn­1)

„ Here Pr(sn|s1 s. . . sn­1) is the probability that word sn follows  d t i

word string s1 s. . . sn­1

„ N‐gram model probability

„ Trigram model probability calculation

„ Trigram model probability calculation

(15)

SMT: Translation Model

„ P(f|e): Probability of some f given hypothesis English  translation e

„ How to assign the values to p(e|f) ?

„ How to assign the values to p(e|f) ?

Sentences are infinite not possible to find pair(e f) for all sentences Sentence level

„ Sentences are infinite, not possible to find pair(e,f) for all sentences

„ Introduce a hidden variable a, that represents alignments 

b t th i di id l d i th t i

between the individual words in the sentence pair

Word level

(16)

Alignment

„

If the string, e= e

1l

= e

1

e

2

…e

l

, has l words, and the string, f= f

1m

=f

1

f

2

...f

m

, has m words,

„

then the alignment, a , can be represented by a series, a

1m

= a

1

a

2

...a

m

, of m values, each between 0 and l such that if the word in position p j j of the f-string is connected to the word in position i of the e-string, then

a = i and

„ aj= i, and

„ if it is not connected to any English word, then aj= O

(17)

Example of alignment p g

English: Ram went to school

English: Ram went to school

Hindi: Raama paathashaalaa gayaa

Ram went to school

<Null> Raama paathashaalaa gayaa

<Null> Raama paathashaalaa gayaa

(18)

Translation Model: Exact expression

Choose alignment 

i d Choose the identity 

f f i d

Choose the length  f f i l

d l f h [ ]

given e and m of foreign word  given e, m, a of foreign language 

string given e

„ Five models for estimating parameters in the expression [2]

„ Model‐1, Model‐2, Model‐3, Model‐4, Model‐5

(19)

Proof of Translation Model: Exact expression

= f a e e

f | ) Pr( , | ) Pr(

expression

; marginalization

a

f f | ) ( , | ) (

=

m

e m a f e

a

f, | ) Pr( , , | ) Pr(

; marginalization

; g

=

m

e m a f e m e

m a

f, , | ) Pr( | )Pr( , | , ) Pr(

=

m

e m a f e

m| )Pr( , | , ) Pr(

∑ ∏

=

= m

m

j

j j j

j a a f m e

f e

m

1

1 1 1

1 , , , )

| , Pr(

)

| Pr(

=Pr(m|e)m Pr(a |aj 1 f j 1 m e)Pr(f |aj f j 1 m e)

=

=

j

j j j j

j j m

e m f

a f e m f

a a e

m

1

1 1 1

1 , , , )Pr( | , , , )

| Pr(

)

| Pr(

)

|

P (f P ( | )

m P ( | j1 f j1 )P (f | j f j1 )

m is fixed for a particular f, hence

)

| , ,

Pr(f a m e =Pr(m|e)

= j

j j j j

j

j a f me f a f me

a

1

1 1 1 1

1 1

1 , , , )Pr( | , , , )

| Pr(

(20)

Model-1

„ Simplest model

„ Assumptions

„ Pr(m|e) is independent of m and e and is equal to ε

„ Alignment of foreign language words (FLWs) depends only on  length of English sentence

= (l+1) ( )

-1

„ lis the length of English sentence

„ The likelihood function will be

M i i th lik lih d f ti t i d t

„ Maximize the likelihood function constrained to 

(21)

Model-1: Parameter estimation

„ Using Lagrange multiplier for constrained maximization,  the solution for model‐1 parameters

„ λe : normalization constant; c(f|e; f,e) expected count; δ(f,fj is 1 if f & f

are same zero otherwise

is 1 if f & f

are same, zero otherwise.

Estimate t(f|e) using Expectation Maximization (EM)      procedure

p

(22)

Model-2

„ Alignment of  FLW to an Eng word depends on its position

„

The likelihood function is

„

Model‐1 & 2

„ Model‐1 is special case of model‐2 where

„ To instantiate the model‐2 parameters, use parameter  estimated in model‐1

(23)

Model-3

„ Fertility: Number of FLWs to which an Eng word is connected  in a randomly selected alignment

„ Tablet: List of FLWs connected to an Eng wordg

„ Tableau: The collection of tablets

„ The alignment process

„ The alignment process

„ for each English word

„ Begin

„ Decide the fertility of the wordy

„ Get a list of French words to connect to the word

„ End

„ Permute words in tableau to generate f

(24)

Model-3: Example

„ English Sentence (e) = Annual inflation rises to 11.42%

„ Step‐1:  Deciding  fertilities(F)

„ e = Annual  inflation       rises       to   11.42%

„ F = Annual  inflation inflation inflation   rises rises rises     to   11.42%

(25)

Model-3: Example

„ English Sentence (e) = Annual inflation rises to 11.42%

„ Step‐2:  Translation to FLWs(T)

„ e = Annual  inflation      rises       to   11.42%

„ F = Annual  inflation inflation inflation   rises rises rises     to   11.42%

„ T=वा षक मुिाःफ ित क दर बढ गई है तक 11.42%

(26)

Model-3: Example

„ English Sentence (e) = Annual inflation rises to 11.42%

„ Step 3: Reordering FLWs(R)

„ Step‐3:  Reordering FLWs(R)

„ e = Annual  inflation      rises       to   11.42%

„ F = Annual  inflation inflation inflation   rises rises rises     to   11.42%

T =वा षक मिाःफ ित क दर बढ गई है तक 11 42%

„ T = वा षक मुिाःफ ित क दर बढ गई ह तक 11.42%

„ R = वा षक मुिाःफ ित क दर 11.42%    तक बढ गई है

„ Values fPr, T, R calculated using the formulas obtained in model‐3 [2]

(27)

Model-4 & 5

„ Model‐3: Every word is moved independently

„ Model‐4: Consider phrases (cept) in a sentence

„ Distortion probability is replaced by

„ A parameter for head of the each cept

„ A parameter for the remaining part of the ceptp g p p

„ Deficiency in model‐3 & 4

„ In distortion probability

f

„

Model­5 removes the deficiency

„ Avoid unavailable positions

„ Introduces a new variable for the positions

(28)

Example Based Machine Translation (EBMT)

„

Basic idea: translate a sentence by using the closest  match in parallel data

„ Inspired by human analogical thinking

„ Inspired by human analogical thinking

(29)

Issues Related to Examples in Corpora

„

Granularity of examples

„ Parallel text should be aligned at the subsentence level

„

Number of examples

„

Number of examples

„

Suitability of examples

„ (i) Columbus discovered America (ii) America was discovered by Columbus ( ) Ti fli lik (b) Ti fli lik

„ (a) Time flies like an arrow  (b) Time flies like an arrow

„

How examples should be stored? p

„ Annotated tree structure

„ Generalized examples

„ “Rajesh will reach Mumbai by 10:00 pm”­>”P will reach D by T”

(30)

Annotated Tree Structure: example

„

Fully annotated tree with explicit links

(31)

EBMT: Matching and Retrieval (1/2)

„

System must be able to recognize the similarity and  differences b/w the input and stored examples

„

String based matching:

„ Longest common subsequence

„ Takes word similarity into account while sense disabiguation

(32)

EBMT: Matching and Retrieval (2/2)

„

Angle of similarity:

„ Trigonometric similarity measure based on relative length & 

relative contents

„ (x). Select ‘Symbol’ in the Insert menu.

„ (y). Select ‘Symbol’ in the Insert menu to enter a character from the symbol set.

„ (z). Select ‘Paste’ in the Edit menu.

(w) Select ‘Paste’ in the Edit menu to enter some text from the clip board

„ (w). Select  Paste in the Edit menu to enter some text from the clip board.

θxy: the qualitative difference between sentence x and y

δ(x,y): the difference between size of x and y,

(33)

EBMT: Adaptation & Recombination

„ Adaptation

„ Extracting appropriate fragments from the matched translation

„ The boy entered  the house‐>  लड़के ने कमरे म ूवेश कया

„ I saw a tiger ‐>   मने एक चीता देखा

„ The boy eats his breakfast  ‐>  लड़के ने उसका नाःता खाया था

„ I saw the boy       ‐>  मने लड़के को देखा था

Boundary Friction

„ Boundary Friction

„ Retrieved translations do not fit the syntactic context

„ I saw the boy       ‐>  * मने लड़के ने देखा था

„ Recombine fragments into target text

„ SMT “language model” can be used

(34)

Interlingua Based MT

„

Interlingua

„ "between languages“

„ SL text converted into a language‐independent or 'universal' 

b h f l TL

abstract representation then transform into several TL

(35)

Universal Networking Language (UNL)

„

UNL is an example of interlingua

„

Represents information sentence by sentence

UNL i d f

„

UNL is composed of

„ Universal words

„ Relations 

„

Example: “I gave him a book”

{unl}

agt ( give.@entry.@past, i )

obj ( give.@entry.@past, book.@indef ) gol ( give.@entry.@past, he )

{/unl}

(36)

Issues related to interlingua

„

Interlingua must

„ Capture the knowledge in text precisely and accurately

„ Handle cross language divergenceHandle cross language divergence

„

Divergence between Hindi‐English language

„ Constituent order divergence

„ Null subject divergence

„ Null subject divergence

„ जा रहा हु == * am going (Iam going)

„ Conflational divergence

„ जीम ने जोहन को छुरे से मारा== Jim stabbedJohn

„ Promotional divergence

„ The play is on== खेलचल रहा है

(37)

Benefits & Shortcomings(1/3)

„

Statistical Machine translation

“Every time I fire a linguist, my system’s performance improves”

(Brown et al. 1988) (Brown et al. 1988)

„ Pros

„ No linguistic knowledge is required

„ Great deal of natural language in machine readable text

„ Great deal of natural language in machine readable text

„ Loose dependencies  b/w languages can be modeled better

„ Cons

„ Probability of rare words can’t be trusted

„ Probability of rare words can t be trusted

„ Not good for idioms, jokes, compound words, text having hidden  meaning

„ Selection of correct morphological word is difficult

(38)

Benefits & Shortcomings(2/3)

„

Example Based MT

„

Pros

P f t t l ti f t if i il f d i

„ Perfect translation of a sentence if very similar one found in  example sentences

„ No need to bother about previously translated sentences

„

Cons

„ Fails if no match found in corpora

„ Problem at points of example concatenation in recombination step

(39)

Benefits & Shortcomings(3/3)

„

Interlingua based MT

„

Pros

Add a ne lang age and get all a s translation to all pre io sl

„ Add a new language and get all‐ways translation to all previously  added languages

„ Monolingual lingual development team

„ Economical in situation where translation among multiple languages g p g g is used

„

Cons

„ “Meaning” is arbitrarily deep. At what level of detail do we stop?g y p p

„ Human development time

(40)

Translation is Ubiquitous

„

Between Languages

„

Delhi is the capital of India

„

Delhi is the capital of India

„

द ली भारत क राजधानी है

Between dialects

„

Between dialects

„

Example next slide

B t i t

„

Between registers

„

My “mom” not well.

„

My “mother” is unwell (in a leave

application)

(41)

Between dialects (1/3) Between dialects (1/3)

„

Lage Raho Munnabhai: an excellent example

„

Scene: Munnabhai (Sanjay Dutt) is Prof. Murli Prasad Sharma being interviewed with some

iti en king q e tion in p e en e of J hn i citizens asking questions in presence of Jahnavi (Vidya Baalan)

Question by citizen:

„

Question by citizen:

„

ूोफेसर साब, पाक म एक नौजवान प थर उठा के बापू के

मूित पर मारा और उसका एक हाथ टूटा ू ह ू दया. मेरे समझ झ

म नह आया म उस नौजवान को केसे समझाऊ .

(42)

Between dialects (2/3) Between dialects (2/3)

„ Bapu from behind invisible to others:

„ उस के हाथ म एक प थर देकर कहना चा हए था बाप का इस पतला

„ उस क हाथ म एक प थर दकर कहना चा हए था बापू का इस पुतला

िगरा दो

„ Munnabhai

उस का हाथ म एक प थर देने का और कहनेका क बाप का इस

„ उस का हाथ म एक प थर दन का और कहनका क बापू का इस पुतला िगरा दो

„ Bapu

े े भी ि

„ इस देश म मेरे जतना भी पुतला है सब िगरा दो

„ Munnabhai

„ full country म मेरा जतना भी पुतला है सब िगरा दो

„ Bapu

„ हर इमारत हर चौराहे हर माग से मेरा नाम िमटा दो

„ Munnabhai

„ Munnabhai

„ हर ब डंग हर नोट वोट रोड से मेरा नाम िमटा दो

(43)

Between dialects (3/3) Between dialects (3/3)

„

Bapu

मेरे हर तसबीर को द वार से हठा दो

„

मेरे हर तसबीर को द वार से हठा दो

„

Munnabhai

„

मेरे जतना भी तसबीर द वार पे लटकेला है ना उसको

„

मर जतना भी तसबीर द वार प लटकला ह ना, उसको

िनकाल के फेक दो

„

Bapu

„

अगर कह रखना है तो अपने दलो म रखो

„

Munnabhai

ै े ो े ो

„

या है क कह रखना छे तो, अपने दल म रखो ना,

समझा या, इधर heart म heart म !

(44)

Comparison b/w SMT, EBMT, Interlingua Comparison b/w SMT, EBMT, Interlingua

Property Example Based MT Statistical MT Interlingua based MT

Parallel Yes Yes No

Corpora Yes Yes No

Dictionary Yes No Yes

Transfer Rules No No Yes

Transfer Rules No No Yes

Parser Yes No Yes

Semantic

analysis No No Yes

Data driven incremental improvement

Yes Yes No

Translation

speed Slow Slow Fast

speed Language

Dependency No No Yes

Intermediate

meaning No No Yes (universal

t ti ) ea g

representation

o o

representation)

(45)

References (1/2)

1. P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer. The mathematics  of statistical machine translation: parameter estimation. Computational  Linguistics, 19(2), 263‐311.  (1993)

2. Makoto Nagao. A framework of a mechanical translation between Japanese  and English by analogy principle, in A. Elithorn and R. Banerji: Artificial  and Human Intelligence. Elsevier Science Publishers. (1984).

3. Somers H. Review Article: Example based Machine Translation. Machine  Translation, Volume 14, Number 2, pp. 113‐157(45). (June 1999)

4. D. Turcato, F. Popowich. What is Example­Based Machine Translation? In  M. Carl and A. Way (eds). Recent Advances of EBMT. Kluwer Adacemic Publishers, Dordrecht. Note, revised version of Workshop Paper.  (2003)

(46)

References (2/2)

5. Dave S., Parikh J. and Bhattacharyya. Interlingua Based English Hindi Machine Translation and Language Divergence. P. Journal of Machine Translation,

Volume 17. (2002)

6 Adam L Be ge Stephen A Della Piet a Y Vincent J Della Piet a Y A 6. Adam L. Berger, Stephen A. Della Pietra Y, Vincent J. Della Pietra Y. A

maximum entropy approach to natural language processing. Computational Linguistics, (22-1), (March 1996).

7. Jason Baldridge, Tom Morton, and Gann Big , , erner. The opennlp.maxent package: p p p g POS tagger, end of sentence detector, tokenizer, name finder.

http://maxent.sourceforge.net/ version- 2.4.0 (Oct. 2005)

8. Universal Networking Language (UNL) Specifications. UNL Center of UNDL Foundation URL: http://www undl org/unlsys/unl/unl2005/ 7 June 2005 Foundation. URL: http://www.undl.org/unlsys/unl/unl2005/. 7 June 2005.

References

Related documents

• Machine language program = sequence of numbers representing machine language instructions. • Machine language

A Lo` vasz formula is a k-CNF formula that has all variables occurring 2 k−2 k − 1 times, and for each variable p, p and ¬p occur nearly equal number of times.

Realizing the importance for coupling the training adn test datapoints, it was proposed to search the space of all trace bounded conic combinations of a given set of base

CS 101 Computer Programming  and Utilization.

 In the algorithm we just saw, the weights of each feature are fixed manually.  Unlike manual approaches, machine learning approaches to coreference resolution induce a model

Now we can compute the probability of sentences like I want English food or I want Chinese food by simply multiplying the appropriate bigram probabilities to- gether, as follows:.

Matsushima, Vishwanathan, and Smola, Linear Support Vector Machines via Dual Cached Loops, KDD 2012... Bringing in

I Lecture 11-14 verification of concurrent programs I Proof systems.. I CEGAR based verification of concurrent programs I Abstract interpretation