• No results found

Doctor of Philosophy

N/A
N/A
Protected

Academic year: 2023

Share "Doctor of Philosophy"

Copied!
206
0
0

Loading.... (view fulltext now)

Full text

The last part of the thesis is devoted to the best use of the comparable corpus proposed for the task of the MT language pair. In this chapter, we describe our contributions regarding the development of the transliteration model. Such resources are not available for most low-resource languages, including Manipuri.

In Chapter 6, we provide an empirical evaluation of the previous approaches to the Manipuri-English language pair. The proposed hybrid model is an improvement on the traditional encoder-decoder-transliteration model [67, 129] by introducing separate encoders for each source input (phoneme and grapheme sequences, respectively) using multi-source Neural Machine Translation (NMT) techniques. The concept of the multi-source encoder-decoder model has been extensively studied in the NMT paradigm.

Explore the performance of the proposed hybrid model along the lines of RNN and transformer architectures. Let us modify the implementation of the current tensor transformer* to implement the proposed model based on the hybrid transformer. In the case of the extremely low resource scenario, the grapheme model outperforms MSHy-Basic in both WA and CA.

The best performance is achieved with MSHy-Addition using a BiGRU encoder in the case of a moderately low resource scenario. While, in the case of the extremely low resource setting, MSHy-Convolution using BiGRU encoder ensured the best result.

Table 1.1: Studies reported for developing different NLP tools, including MT for Manipuri language.
Table 1.1: Studies reported for developing different NLP tools, including MT for Manipuri language.

MUSE

The table also indicates that the effectiveness of the proposed segmenter in normalizing the morphological inflection issue. The figure also clearly demonstrates the reliability of the proposed comparable corpus for generating CLWEs. We empirically investigate the performance of the popular unsupervised models (MUSE and Vecmap) on the language pair bilingual dictionary induction task.

This study first empirically evaluates state-of-the-art unsupervised MT approaches on a language pair. The word translation probabilities φw are also calculated in the same way as for φph, but at the word level. In order to investigate the performance of the proposed suffix segmentator for MT, we compare the performance of previous models (XLM, MASS, UNMT based on CLE [14]. https://github.com/glample/fastBPE.

The performance of the models on the pre-processed corpus obtained after applying the suffix segmentation algorithm is denoted by Seg. This further validates the models' inability to capture translation features for the Manipuri-English language pair. Di,j((XiWX)·(ZjWZ))) (6.4) where VX and VZ are the vocabulary set of the source and target language, respectively.

A major limitation of the methods discussed above is that the phrase table is directly driven by CLEs. Most previous studies attempted to extract parallel segments (sentences, phrases, or words) from the comparable corpus to aid conventional data-driven MT approaches [79, 254]. The first three rows represent the performance of the baselines: (1) Default monosets, (2) RS-phrase monosets (Reestimate both phrase translation probabilities and lexical weights), and (3) Monosets initialized with the generative transliteration pairs- measure with the character accuracy (CA) threshold set to 100%.

Our proposed model can be seen as an improvement on the Manipuri-English MT model [124] (also presented in the previous section 6.5.1). It is clear from the results that each of the proposed modifications improves the performance for both translation directions. This may be due to the relatively low coverage of the document-adjusted corpus compared to the time-adjusted comparable corpus.

In particular, exploiting the translation features derived from temporally aligned and document-oriented features of the comparable corpus. We further discuss the limitations of the proposed methods and possible future directions to explore.

Table 5.1: Manipuri-English News Domain Comparable Corpus.
Table 5.1: Manipuri-English News Domain Comparable Corpus.

BLEU

I Proceedings of the ACL-02 workshop on Computational approaches to semitic languages ​​(s. 1-13).: Association for Computational Linguistics. I Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (s. Association for Computational Linguistics. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Bind 1: Lange artikler) (s. 789–798) .

In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing Brussel, Belgium: Association for Computational Linguistics. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. In Proceedings of the 43th Annual Meeting on Association for Computational Linguistics (pp Association for Computational.

V Proceedings of the 42nd Annual Meeting on association for Computational Linguistics (str. 159).: Association for Computational Linguistics. In Proceedings of the Sixth Workshop on Statistical Machine Translation (str Association for Computational Linguistics. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 58–68).

In Proceedings of the 18th conference on Computational Linguistics-Volume 1 (pp Association for Computational Linguistics. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 196–202). Proceedings of the Eighth International Joint Conference on Natural Language Processing (Deel 2: Short Papers) (pp. 296-301).

In Proceedings of the 56th Annual Meeting of the Society for Computational Linguistics (Volume 2: Short Papers) (pp. 228–234). In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume (pp. InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 86–96).

Figure

Table 1.1: Studies reported for developing different NLP tools, including MT for Manipuri language.
Figure 3.1: Unicode Conversion Framework of Manipuri texts with examples. The number below each character represents the respective code point.
Table 3.1: Manipuri-English News Domain Comparable Corpus. The number of sentences presented here have at least three words.
Table 3.2: Manipuri-English Comparable Corpus with stronger degree of comparability
+7

References

Related documents

In the given archi- Figure 4: Multitask Setup tecture, the Spoken Translation task uses the source speech encoder and target text decoder for training the model.. The Automatic Speech