MoL-2011-05: Smoothing a PBSMT Model by Factoring Out Adjuncts

MoL-2011-05: Arnoult, Sophie (2011) Smoothing a PBSMT Model by Factoring Out Adjuncts. [Report]

[img]
Preview
Text (Full Text)
MoL-2011-05.text.pdf

Download (590kB) | Preview
[img] Text (Abstract)
MoL-2011-05.abstract.txt

Download (1kB)

Abstract

Phrase-Based Statistical Machine Translation (PBSMT) became a leading paradigm in Statistical Machine Translation after its introduction in 2003. From the start, one has tried to improve PBSMT by using linguistic knowledge, often by incorporating syntactic information into the model. This thesis proposes a simple approach to improve PBSMT using a general linguistic notion, that of adjuncts, or modifiers: One expects that in structurally similar languages like French and English, adjuncts in one language are likely to be translated as adjuncts in the other language. After verifying this assumption, this thesis describes how adjunct pairs are deleted from a bilingual corpus to generate new training data for a model, which is then used to smooth a PBSMT baseline. Experiments on a smoothed French-English model show only a marginal improvement over the baseline. It appears that few of the phrase pairs gained by adjunct-pair deletion are actually used in testing, so that improvement in performance mostly results from successful smoothing. Further research directions would be to find out in how far performance can be improved for this system, but also to apply adjunct-pair deletion to other language pairs and to hierarchical SMT models.

Item Type: Report
Report Nr: MoL-2011-05
Series Name: Master of Logic Thesis (MoL) Series
Year: 2011
Date Deposited: 12 Oct 2016 14:38
Last Modified: 12 Oct 2016 14:38
URI: https://eprints.illc.uva.nl/id/eprint/850

Actions (login required)

View Item View Item