DS-2020-08: Latent Variable Models for Machine Translation and How to Learn Them

DS-2020-08: Schulz, Philip (2020) Latent Variable Models for Machine Translation and How to Learn Them. Doctoral thesis, University of Amsterdam.

[thumbnail of Full Text] Text (Full Text)

Download (796kB)
[thumbnail of Samenvatting] Text (Samenvatting)

Download (5kB)


This thesis concerns itself with variation in parallel linguistic data and how to model it for the purpose of machine translation. It also reflects the paradigm shift from phrase-based to neural machine translation in that it addresses the variation phenomena in both frameworks.

Machine translation is the task of automatically translating text between different languages. As such, any trainable machine translation system needs to be exposed to a lot of parallel training data, i.e. sets of sentences in two or more languages that we know to be translations of one another. This data is expensive to generate since translations need to be produced by human translators first. Of course, not all human translators perform their task identical. On the contrary, their translation outputs may vary wildly. This has to do with the personal style that every translator injects into their work as well as the translators proficiency in a particular language or domain. A more experienced translator will likely produce more accurate results than a newly trained one. Similarly, a translator who specialises in sports news may not be qualified to translate legal documents. Besides these differences between translators, a translator's performance can vary from day to day depending on factors such as motivation, fatigue, stress and the like. Finally, there is variation between languages. Many romance languages allow for the omission of pronouns under certain circumstances while the use of pronouns is mandatory in English. German and many Slavic languages employ grammatical gender, a concept unknown (and deeply confusing) to the anglophone parts of the world.

For machine translation research this presents the following problem: the data is not homogeneous and a verbatim translation that may be appropriate in one context may be wrong in another. Moreover, translation systems are usually trained on a variety of documents from different sources which means that they encounter different linguistic styles. Users of machine translation systems these days expect not only an accurate translation that carries all the information of the original text, they also expect it to be grammatical and well-sound. I have therefore taken to modelling at least some of the variation found in translation data. My main contention is that this improves the output translation as it relaxes the assumption of data homogeneity which we know to be false. Measuring translation quality with the commonly used BLEU I show experimentally that this indeed the case.

This thesis begins with a short motivating introduction in Chapter 1. It then provides the necessary mathematical background in Chapter 2. Since the probabilistic models presented in this thesis necessitate the use of approximate inference techniques, a particular emphasis is places on these methods. The chapter also provides the reader with an introduction to phrase-based and neural machine translation.

Chapter 3 introduces a new latent variable model to handle variation in word alignment. Word alignment is the first step in the phrase-based machine translation pipeline. It connects words across two parallel sentences which are likely translations of each other. These word-level translations are expanded to phrases in a later step which are in turn memorised by the translation system. A common assumption of many word alignment models is that each word in one of the languages needs to have a counterpart on the other side. This is of course false, since languages vary in how they express concepts. As mentioned earlier, some languages omit pronouns while others may omit prepositions. The reason that a pronoun occurs in sentence A and not in sentence B is thus entirely due to the grammatical requirements of language A and has nothing to do with translation. I therefore present a latent variable model that is a mixture of a classical alignment models and a language model component. The language model can account for grammatically induced words and thus prevents the alignment models from producing erroneous alignments. Experiments show that the resulting alignments lead to improved translations.

In model presented in Chapter 4 approaches variation phenomena more holistically as it is embedded into an end-to-end neural machine translation system. The hypothesis underlying that model is that the sources of variation in translation are too numerous to annotate explicitly. The model therefore attributes all variation at a given word position in the translation data to a common noise source. The innovation here is that the noise sources evolves together with the translation. Noise is modelled on a word (or sub-word) level and changes according to the hitherto produced translation. The model is an instance of a deep generative model, in particular a variational autoencoder, and uses recent variational inference techniques that allow for gradient flow through stochastic computation graphs. Not only does the model outperform its baselines, it is also shown to produce different but accurate translations when the noise source is varied stochastically.

The thesis concludes with Chapter 5. That chapter also provides an outlook on future research avenues for which I hope to have provided some of the groundwork.

Item Type: Thesis (Doctoral)
Report Nr: DS-2020-08
Series Name: ILLC Dissertation (DS) Series
Year: 2020
Subjects: Computation
Depositing User: Dr Marco Vervoort
Date Deposited: 14 Jun 2022 15:17
Last Modified: 14 Jun 2022 15:17
URI: https://eprints.illc.uva.nl/id/eprint/2177

Actions (login required)

View Item View Item