CS460/626 : Natural Language CS460/626 : Natural Language
Processing/Speech, NLP and the Web
(Lecture 10 11 MT approaches) (Lecture 10, 11–MT approaches)
Pushpak Bhattacharyya Pushpak Bhattacharyya
CSE Dept., IIT Bombay
25
thJ d 27
thJ 2011 25
thJan and 27
thJan, 2011
Acknowledgement: parts are from Hansraj’s dual degree seminar presentation
Czeck-English data
[nesu] “I carry”
[ponese] “He will carry”
[ponese] He will carry
[nese] “He carries”
[ ] “ h ”
[nesou] “They carry”
[yedu] “I drive”
[plavou] “They swim”
To translate …
I will carry.
They drive
They drive.
He swims.
h ll d
They will drive.
Hindi-English data
[DhotA huM] “I carry”
[DhoegA] “He will carry”
[DhoegA] He will carry
[DhotA hAi] “He carries”
[ h h ] “ h ”
[Dhote hAi] “They carry”
[chalAtA huM] “I drive”
[tErte hEM] “They swim”
Bangla-English data
[bai] “I carry”
[baibe] “He will carry”
[baibe] He will carry
[bay] “He carries”
[b ] “ h ”
[bay] “They carry”
[chAlAi] “I drive”
[sAMtrAy] “They swim”
MT Approaches
interlingua interlingua
semantics semantics
syntax syntax
phrases phrases
words
phrases phrases
words words
words
SOURCE TARGET
Taxonomy
MT MT
Approaches
Knowledge Based;
Rule Based MT
Data driven;
Machine Learning Based
Example Based MT (EBMT)
Statistical MT
Interlingua Based Transfer Based MT (EBMT)
Interlingua Based Transfer Based
Motivation
z MT: NLP Complete
z NLP: AI complete
z AI: CS complete
z AI: CS complete
z How will the world be different when the language barrier disappears?
V l f t t i d t b t l t d tl d
z Volume of text required to be translated currently exceeds translators’ capacity (demand outstrips supply).
z Solution: automation (the onlysolution)
M hi t l ti t h i
z Many machine translation techniques
z Which approach is better for Hindi‐English MT
Interlingual representation:
complete disambiguation
Washington voted Washington to power
Vote
@ t <is-a >
@past action
goal Washingto
n power Washington
@emphasis
@emphasis
<is-a >
place
<is-a >
capability
<is-a > …
<is-a >
person
Kinds of disambiguation needed for a complete and correct
for a complete and correct interlingua graph
N: Name
P: POS
P: POS
A: Attachment
S: Sense
C: Co-reference
R: Semantic Role
Target Sentence Generation from interlingua
Target Sentence Generation
Lexical Transfer Morphological S t
Lexical Transfer Syntax
Planning Morphological
Synthesis (Word/Phrase
Translation ) (Word form Generation)
(Sequence) Generation)
Role of function word
Washington voted Washington to power.
power.
Washington ne Washington ko
Washington
agentne Washington
objectko sattaa
goalke liye chunaa
Vote- chunna
Power- sattaa
Statistical Machine Translation (SMT)
Data driven approach
Goal is to find out the English sentence e given foreign language sentence f whose p(e|f) is maximum.
Translations are generated on the basis of statistical model
Translations are generated on the basis of statistical model
Parameters are estimated using bilingual parallel corpora
SMT: Language Model
To detect good English sentences
Probability of an English sentence s1s2 …… sn can be written as
as
Pr(s1s2 …… sn) = Pr(s1) * Pr(s2|s1) *. . . * Pr(sn|s1 s2 . . . sn1)
Here Pr(sn|s1 s2 . . . sn1) is the probability that word sn follows d t i
word string s1 s2 . . . sn1.
N‐gram model probability
Trigram model probability calculation
Trigram model probability calculation
SMT: Translation Model
P(f|e): Probability of some f given hypothesis English translation e
How to assign the values to p(e|f) ?
How to assign the values to p(e|f) ?
Sentences are infinite not possible to find pair(e f) for all sentences Sentence level
Sentences are infinite, not possible to find pair(e,f) for all sentences
Introduce a hidden variable a, that represents alignments
b t th i di id l d i th t i
between the individual words in the sentence pair
Word level
Alignment
If the string, e= e
1l= e
1e
2…e
l, has l words, and the string, f= f
1m=f
1f
2...f
m, has m words,
then the alignment, a , can be represented by a series, a
1m= a
1a
2...a
m, of m values, each between 0 and l such that if the word in position p j j of the f-string is connected to the word in position i of the e-string, then
a = i and
aj= i, and
if it is not connected to any English word, then aj= O
Example of alignment p g
English: Ram went to school
English: Ram went to school
Hindi: Raama paathashaalaa gayaa
Ram went to school
<Null> Raama paathashaalaa gayaa
<Null> Raama paathashaalaa gayaa
Translation Model: Exact expression
Choose alignment
i d Choose the identity
f f i d
Choose the length f f i l
d l f h [ ]
given e and m of foreign word given e, m, a of foreign language
string given e
Five models for estimating parameters in the expression [2]
Model‐1, Model‐2, Model‐3, Model‐4, Model‐5
Proof of Translation Model: Exact expression
∑
= f a e e
f | ) Pr( , | ) Pr(
expression
; marginalization
∑
a
f f | ) ( , | ) (
∑
=
m
e m a f e
a
f, | ) Pr( , , | ) Pr(
∑
; marginalization
; g
∑
=
m
e m a f e m e
m a
f, , | ) Pr( | )Pr( , | , ) Pr(
∑
=
m
e m a f e
m| )Pr( , | , ) Pr(
∑ ∏
=
−
= − m
m
j
j j j
j a a f m e
f e
m
1
1 1 1
1 , , , )
| , Pr(
)
| Pr(
∏
∑ − − −
= ∑Pr(m|e)∏m Pr(a |aj 1 f j 1 m e)Pr(f |aj f j 1 m e)
=
=
j
j j j j
j j m
e m f
a f e m f
a a e
m
1
1 1 1
1 , , , )Pr( | , , , )
| Pr(
)
| Pr(
)
|
P (f P ( | )
∏
m P ( | j−1 f j−1 )P (f | j f j−1 )m is fixed for a particular f, hence
)
| , ,
Pr(f a m e =Pr(m|e)
∏
= j
j j j j
j
j a f me f a f me
a
1
1 1 1 1
1 1
1 , , , )Pr( | , , , )
| Pr(
Model-1
Simplest model
Assumptions
Pr(m|e) is independent of m and e and is equal to ε
Alignment of foreign language words (FLWs) depends only on length of English sentence
= (l+1) ( )
-1 lis the length of English sentence
The likelihood function will be
M i i th lik lih d f ti t i d t
Maximize the likelihood function constrained to
Model-1: Parameter estimation
Using Lagrange multiplier for constrained maximization, the solution for model‐1 parameters
λe : normalization constant; c(f|e; f,e) expected count; δ(f,fj) is 1 if f & f
are same zero otherwise
is 1 if f & fj
are same, zero otherwise.
Estimate t(f|e) using Expectation Maximization (EM) procedure
p
Model-2
Alignment of FLW to an Eng word depends on its position
The likelihood function is
Model‐1 & 2
Model‐1 is special case of model‐2 where
To instantiate the model‐2 parameters, use parameter estimated in model‐1
Model-3
Fertility: Number of FLWs to which an Eng word is connected in a randomly selected alignment
Tablet: List of FLWs connected to an Eng wordg
Tableau: The collection of tablets
The alignment process
The alignment process
for each English word
Begin
Decide the fertility of the wordy
Get a list of French words to connect to the word
End
Permute words in tableau to generate f
Model-3: Example
English Sentence (e) = Annual inflation rises to 11.42%
Step‐1: Deciding fertilities(F)
e = Annual inflation rises to 11.42%
F = Annual inflation inflation inflation rises rises rises to 11.42%
Model-3: Example
English Sentence (e) = Annual inflation rises to 11.42%
Step‐2: Translation to FLWs(T)
e = Annual inflation rises to 11.42%
F = Annual inflation inflation inflation rises rises rises to 11.42%
T=वा षक मुिाःफ ित क दर बढ गई है तक 11.42%
Model-3: Example
English Sentence (e) = Annual inflation rises to 11.42%
Step 3: Reordering FLWs(R)
Step‐3: Reordering FLWs(R)
e = Annual inflation rises to 11.42%
F = Annual inflation inflation inflation rises rises rises to 11.42%
T =वा षक मिाःफ ित क दर बढ गई है तक 11 42%
T = वा षक मुिाःफ ित क दर बढ गई ह तक 11.42%
R = वा षक मुिाःफ ित क दर 11.42% तक बढ गई है
Values fPr, T, R calculated using the formulas obtained in model‐3 [2]
Model-4 & 5
Model‐3: Every word is moved independently
Model‐4: Consider phrases (cept) in a sentence
Distortion probability is replaced by
A parameter for head of the each cept
A parameter for the remaining part of the ceptp g p p
Deficiency in model‐3 & 4
In distortion probability
f
Model5 removes the deficiency
Avoid unavailable positions
Introduces a new variable for the positions
Example Based Machine Translation (EBMT)
Basic idea: translate a sentence by using the closest match in parallel data
Inspired by human analogical thinking
Inspired by human analogical thinking
Issues Related to Examples in Corpora
Granularity of examples
Parallel text should be aligned at the subsentence level
Number of examples
Number of examples
Suitability of examples
(i) Columbus discovered America (ii) America was discovered by Columbus ( ) Ti fli lik (b) Ti fli lik
(a) Time flies like an arrow (b) Time flies like an arrow
How examples should be stored? p
Annotated tree structure
Generalized examples
“Rajesh will reach Mumbai by 10:00 pm”>”P will reach D by T”
Annotated Tree Structure: example
Fully annotated tree with explicit links
EBMT: Matching and Retrieval (1/2)
System must be able to recognize the similarity and differences b/w the input and stored examples
String based matching:
Longest common subsequence
Takes word similarity into account while sense disabiguation
EBMT: Matching and Retrieval (2/2)
Angle of similarity:
Trigonometric similarity measure based on relative length &
relative contents
(x). Select ‘Symbol’ in the Insert menu.
(y). Select ‘Symbol’ in the Insert menu to enter a character from the symbol set.
(z). Select ‘Paste’ in the Edit menu.
(w) Select ‘Paste’ in the Edit menu to enter some text from the clip board
(w). Select Paste in the Edit menu to enter some text from the clip board.
θxy: the qualitative difference between sentence x and y
δ(x,y): the difference between size of x and y,
EBMT: Adaptation & Recombination
Adaptation
Extracting appropriate fragments from the matched translation
The boy entered the house‐> लड़के ने कमरे म ूवेश कया
I saw a tiger ‐> मने एक चीता देखा
The boy eats his breakfast ‐> लड़के ने उसका नाःता खाया था
I saw the boy ‐> मने लड़के को देखा था
Boundary Friction
Boundary Friction
Retrieved translations do not fit the syntactic context
I saw the boy ‐> * मने लड़के ने देखा था
Recombine fragments into target text
SMT “language model” can be used
Interlingua Based MT
Interlingua
"between languages“
SL text converted into a language‐independent or 'universal'
b h f l TL
abstract representation then transform into several TL
Universal Networking Language (UNL)
UNL is an example of interlingua
Represents information sentence by sentence
UNL i d f
UNL is composed of
Universal words
Relations
Example: “I gave him a book”
{unl}
agt ( give.@entry.@past, i )
obj ( give.@entry.@past, book.@indef ) gol ( give.@entry.@past, he )
{/unl}
Issues related to interlingua
Interlingua must
Capture the knowledge in text precisely and accurately
Handle cross language divergenceHandle cross language divergence
Divergence between Hindi‐English language
Constituent order divergence
Null subject divergence
Null subject divergence
जा रहा हु == * am going (Iam going)
Conflational divergence
जीम ने जोहन को छुरे से मारा== Jim stabbedJohn
Promotional divergence
The play is on== खेलचल रहा है
Benefits & Shortcomings(1/3)
Statistical Machine translation
“Every time I fire a linguist, my system’s performance improves”
(Brown et al. 1988) (Brown et al. 1988)
Pros
No linguistic knowledge is required
Great deal of natural language in machine readable text
Great deal of natural language in machine readable text
Loose dependencies b/w languages can be modeled better
Cons
Probability of rare words can’t be trusted
Probability of rare words can t be trusted
Not good for idioms, jokes, compound words, text having hidden meaning
Selection of correct morphological word is difficult
Benefits & Shortcomings(2/3)
Example Based MT
Pros
P f t t l ti f t if i il f d i
Perfect translation of a sentence if very similar one found in example sentences
No need to bother about previously translated sentences
Cons
Fails if no match found in corpora
Problem at points of example concatenation in recombination step
Benefits & Shortcomings(3/3)
Interlingua based MT
Pros
Add a ne lang age and get all a s translation to all pre io sl
Add a new language and get all‐ways translation to all previously added languages
Monolingual lingual development team
Economical in situation where translation among multiple languages g p g g is used
Cons
“Meaning” is arbitrarily deep. At what level of detail do we stop?g y p p
Human development time
Translation is Ubiquitous
Between Languages
Delhi is the capital of India
Delhi is the capital of India
द ली भारत क राजधानी है
Between dialects
Between dialects
Example next slide
B t i t
Between registers
My “mom” not well.
My “mother” is unwell (in a leave
application)
Between dialects (1/3) Between dialects (1/3)
Lage Raho Munnabhai: an excellent example
Scene: Munnabhai (Sanjay Dutt) is Prof. Murli Prasad Sharma being interviewed with some
iti en king q e tion in p e en e of J hn i citizens asking questions in presence of Jahnavi (Vidya Baalan)
Question by citizen:
Question by citizen:
ूोफेसर साब, पाक म एक नौजवान प थर उठा के बापू के
मूित पर मारा और उसका एक हाथ टूटा ू ह ू दया. मेरे समझ झ
म नह आया म उस नौजवान को केसे समझाऊ .
Between dialects (2/3) Between dialects (2/3)
Bapu from behind invisible to others:
उस के हाथ म एक प थर देकर कहना चा हए था बाप का इस पतला
उस क हाथ म एक प थर दकर कहना चा हए था बापू का इस पुतला
िगरा दो
Munnabhai
उस का हाथ म एक प थर देने का और कहनेका क बाप का इस
उस का हाथ म एक प थर दन का और कहनका क बापू का इस पुतला िगरा दो
Bapu
े े े भी ै ि ो
इस देश म मेरे जतना भी पुतला है सब िगरा दो
Munnabhai
ई full country म मेरा जतना भी पुतला है सब िगरा दोु
Bapu
हर इमारत हर चौराहे हर माग से मेरा नाम िमटा दो
Munnabhai
Munnabhai
हर ब डंग हर नोट वोट रोड से मेरा नाम िमटा दो
Between dialects (3/3) Between dialects (3/3)
Bapu
मेरे हर तसबीर को द वार से हठा दो
मेरे हर तसबीर को द वार से हठा दो
Munnabhai
मेरे जतना भी तसबीर द वार पे लटकेला है ना उसको
मर जतना भी तसबीर द वार प लटकला ह ना, उसको
िनकाल के फेक दो
Bapu
अगर कह रखना है तो अपने दलो म रखो
Munnabhai
ै े ो े ो
या है क कह रखना छे तो, अपने दल म रखो ना,
समझा या, इधर heart म heart म !
Comparison b/w SMT, EBMT, Interlingua Comparison b/w SMT, EBMT, Interlingua
Property Example Based MT Statistical MT Interlingua based MT
Parallel Yes Yes No
Corpora Yes Yes No
Dictionary Yes No Yes
Transfer Rules No No Yes
Transfer Rules No No Yes
Parser Yes No Yes
Semantic
analysis No No Yes
Data driven incremental improvement
Yes Yes No
Translation
speed Slow Slow Fast
speed Language
Dependency No No Yes
Intermediate
meaning No No Yes (universal
t ti ) ea g
representation
o o
representation)
References (1/2)
1. P. Brown, S. Della Pietra, V. Della Pietra, and R. Mercer. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), 263‐311. (1993)
2. Makoto Nagao. A framework of a mechanical translation between Japanese and English by analogy principle, in A. Elithorn and R. Banerji: Artificial and Human Intelligence. Elsevier Science Publishers. (1984).
3. Somers H. Review Article: Example based Machine Translation. Machine Translation, Volume 14, Number 2, pp. 113‐157(45). (June 1999)
4. D. Turcato, F. Popowich. What is ExampleBased Machine Translation? In M. Carl and A. Way (eds). Recent Advances of EBMT. Kluwer Adacemic Publishers, Dordrecht. Note, revised version of Workshop Paper. (2003)
References (2/2)
5. Dave S., Parikh J. and Bhattacharyya. Interlingua Based English Hindi Machine Translation and Language Divergence. P. Journal of Machine Translation,
Volume 17. (2002)
6 Adam L Be ge Stephen A Della Piet a Y Vincent J Della Piet a Y A 6. Adam L. Berger, Stephen A. Della Pietra Y, Vincent J. Della Pietra Y. A
maximum entropy approach to natural language processing. Computational Linguistics, (22-1), (March 1996).
7. Jason Baldridge, Tom Morton, and Gann Big , , erner. The opennlp.maxent package: p p p g POS tagger, end of sentence detector, tokenizer, name finder.
http://maxent.sourceforge.net/ version- 2.4.0 (Oct. 2005)
8. Universal Networking Language (UNL) Specifications. UNL Center of UNDL Foundation URL: http://www undl org/unlsys/unl/unl2005/ 7 June 2005 Foundation. URL: http://www.undl.org/unlsys/unl/unl2005/. 7 June 2005.