• No results found

State of the Art

N/A
N/A
Protected

Academic year: 2022

Share "State of the Art"

Copied!
29
0
0

Loading.... (view fulltext now)

Full text

(1)

Machine Translation

Om Damani

(Ack: Material taken from JurafskyMartin 2 nd Ed., Brown

et. al. 1993)

(2)

2

The spirit is willing but the flesh is weak

English-Russian Translation System

Дух охотно готов но плоть слаба

Russian-English Translation System

The vodka is good, but the meat is rotten

State of the Art

Babelfish: Spirit is willingly ready but flesh it is weak Google: The spirit is willing but the flesh is week

(3)

3

The spirit is willing but the flesh is weak

Google English-Hindi Translation System

आमा पर शरीर दबलु है

Google Hindi-English Translation System

Spirit on the flesh is weak

State of the Art (English-Hindi) – March

19, 2009

(4)

4

Is state of the art so bad

Google English-Hindi Translation System

कला की हालत इतनी खराब है

Google Hindi-English Translation System

The state of the art is so bad

Is State of the Art (English-Hindi) so

bad

(5)

5

State of the english hindi translation is not so bad

Google English-Hindi Translation System

राय के अंमेज़ी िहदी अनुवाद का इतना बुरा नहीं है

Google Hindi-English Translation System

State of the English translation of English is not so bad

State of the english-hindi translation is not so bad

OK. Maybe it is __ bad.

OK. Maybe it is __ bad.

(6)

6

State of the English Hindi translation is not so bad

Google English-Hindi Translation System

राय म! अंमेज ी से िहंदी अनुवाद का इतना बुरा नहीं है

Google Hindi-English Translation System

English to Hindi translation in the state is not so bad

State of the English-Hindi translation is not so bad

OK. Maybe it is __ __ bad.

OK. Maybe it is __ __ bad.

राय के अंमेज़ी िहदी अनुवाद का इतना बुरा नहीं है

(7)

7

Your Approach to Machine Translation

(8)

8

Translation Approaches

(9)

9

Direct Transfer – What Novices do

(10)

10

Direct Transfer: Limitations

Lexical Transfer: Many Bengali poet-PL,OBL this land of songs {sing has}- PrPer,Pl

Many Bengali poets have sung songs of this land Final: Many Bengali poets of this land songs have sung

Local Reordering: Many Bengali poet-PL,OBL of this land songs {has sing}- PrPer,Pl

कई बंग ाली किवय' ने इस भूिम के ग ीत ग ाए ह,

Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain

Morph:

कई बंग ाली किव- PL,OBL ने इस भूिम के ग ीत {ग ाए है}- PrPer,Pl

Kai Bangali kavi-PL,OBL ne is bhoomi ke geet {gaaye hai}-PrPer,Pl

(11)

11

Syntax Transfer

(Analysis-Transfer-Generation)

Here phrases NP, VP etc. can be arbitrarily large

(12)

12

Syntax Transfer Limitations

He went to Patna -> Vah Patna gaya

He went to Patil -> Vah Patil ke pas gaya

Translation of went depends on the semantics of the object of went

Fatima eats salad with spoon – what happens if you change spoon

Semantic properties need to be included in transfer rules – Semantic Transfer

(13)

13

Interlingua Based Transfer

you this

farmer

agt obj

pur

plc

contact

nam

or region

khatav

manchar taluka

nam :01

For this, you contact the farmers of Manchar region or of Khatav taluka.

In theory: N analysis and N transfer modules in stead of N2

In practice: Amazingly complex system to tackle N2 language pairs

(14)

14

Difficulties in Translation – Language Divergence

(

Concepts from Dorr 1993, Text/Figures from Dave, Parikh and Bhattacharyya 2002

)

Constituent Order Prepositional Stranding Null Subject

Conflational Divergence Categorical Divergence

(15)

15

Lost in Translation: We are talking mostly about syntax, not semantics, or pragmatics

You: Could you give me a glass of water Robot: Yes.

….wait..wait..nothing happens..wait…

…Aha, I see…

You: Will you give me a glass of water

…wait…wait..wait..

Image from http://inicia.es/de/rogeribars/blog/lost_in_translation.gif

(16)

16

CheckPoint

State of the Art

Different Approaches

Translation Difficulty

Need for a novel approach

(17)

17

Statistical Machine Translation: Most ridiculous idea ever

Consider all possible partitions of a sentence.

For a given partition,

Consider all possible translations of each part.

Consider all possible combinations of all possible translations Consider all possible permutations of each combination

And somehow select the best partition/translation/permutation

कई बंग ाली किवय' ने इस भूिम के ग ीत ग ाए ह, Kai Bangali kaviyon ne is bhoomi ke geet gaaye hain

have sung songs farm

Poets from Bangladesh

song sung space

in this Many poets from

Bangal

sing songs

‘s place

to this Several Bengali

have sung poem of

land this

Many Bengali Poets

ग ीत ग ाए ह, के

भूिम ने इस

कई बंग ाली किवय'

To this space have sung songs of many poets from Bangal

(18)

18

How many combinations are we talking about

Number of choices for a N word sentence

N=20 ??

Number of possible chess games

(19)

19

How do we get the Phrase Table

Collect large amount of bi-lingual parallel text.

For each sentence pair,

Consider all possible partitions of both sentences For a given partition pair,

Consider all possible mapping between parts (phrases) on two side Somehow assign the probability to each phrase pair

इसके िलए आप मंचर 1ेऽ के िकसान' सॆ संपक कीिज ए

For this you contact the farmers of Manchar region

(20)

20

Formulating the Problem

. A language model to compute P(E)

. A translation model to compute P(F|E)

. A decoder, which is given F and produces the most probable E

(21)

21

P(F|E) vs. P(E|F)

P(F|E) is the translation probability – we need to look at the generation process by which <F,E> pair is obtained.

Parts of F correspond to parts of E. With suitable independence assumptions, P(F,E) measures whether all parts of E are covered by F.

E can be quite ill-formed.

It is OK if P(F|E) for an ill-formed E is greater than P(F|E) for a well formed E. Multiplication by P(E) should hopefully take care of it.

We do not have that luxury in estimating P(E|F) directly – we will need to ensure that well-formed E score higher.

Summary: For computing P(F|E), we may make several independence assumptions that are not valid. P(E) compensated for that.

P(बािरश होरही है|It is raining) = .02 P(बरसात आ रही है| It is raining) = .03

P(बािरश होरही है|rain is happening) = .420

We need to estimate P(It is raining| बािरश होरही है) vs. P(rain is happening| बािरश होरही है)

(22)

22

(23)

23

CheckPoint

From a parallel corpus, generate probabilistic phrase table

Give a sentence, generate various

candidate translations using the phrase table

Evaluate the candidates using Translation

and Language Models

(24)

24

What is the meaning of Probability of Translation

What is the meaning of P(F|E)

By Magic: you simply know P(F|E) for every (E,F) pair – counting in a parallel corpora

Or, we need a ‘random process’ to generate F from E

A semantic graph G is generated from E and F is generated from G

We are no better off. We now have to estimate P(G|E) and P(F|G) for various G and then combine them – How?

We may have a deterministic procedure to convert E to G, in which case we still need to estimate P(F|G)

A parse tree T

E

is generated from E; T

E

is transformed to T

F;

finally T

F

is converted into F

Can you write the mathematical expression

(25)

25

The Generation Process

Partition: Think of all possible partitions of the source language

Lexicalization: For a give partition, translate each phrase into the foreign language

Spurious insertion: add foreign words that are not attributable to any source phrase

Reordering: permute the set of all foreign words - words possibly moving across phrase boundaries

Try writing the probability expression for the generation process

We need the notion of alignment

(26)

26

Generation Example: Alignment

(27)

27

Simplify Generation: Only 1->Many

Alignments allowed

(28)

28

Alignment: Key Concept

A function from target position to source position:

The alignment sequence is: 2,3,4,5,6,6,6 Alignment function A: A(1) = 2, A(2) = 3 ..

A different alignment function will give the sequence:1,2,1,2,3,4,3,4 for A(1), A(2)..

To allow spurious insertion, allow alignment with word 0 (NULL) No. of possible alignments: 2(I+1)*J

(29)

29

CheckPoint

From a parallel corpus, generate probabilistic phrase table

Give a sentence, generate various

candidate translations using the phrase table

Evaluate the candidates using Translation and Language Models

Understanding of Generation Process is critical

Notion of Alignment is important

References

Related documents

[r]

[r]

¨ÉÉGòÉäº]õÉä¨ÉºÉ, ºÉÒ. ¨ÉÉGòÉä±ÉäÊ{Éb÷Éä]ÂõºÉ +ÉÊnù), xÉÉäxÉ-{ÉäÊxÉ+É&lt;b÷ ËSÉMÉ]õ (+ºÉä]ÂõºÉ VÉÉÊiɪÉÉÄ,

EåòpùÒªÉ ºÉ¨ÉÖpùÒ ¨ÉÉÎiºªÉEòÒ +xÉÖºÉÆvÉÉxÉ ºÉƺlÉÉxÉ (¦ÉÉ®úiÉÒªÉ EÞòÊ¹É +xÉÖºÉÆvÉÉxÉ {ÉÊ®ú¹Énù)..

 The inversion channel of a MOSFET can be seen as a resistor.  Since the charge density inside the channel depends on the gate voltage, this resistance is also

The aim of the present overview is to introduce (a) the basics of free radical and antioxidant metabolism, (b) the role of the protein quality control system in protecting cells

[r]

Employee issued a major penalty C/Sheet but finally imposed a minor penalty.. The treatment of