Smoothing a PBSMT Model by Factoring Out Adjuncts
Sophie Arnoult

Abstract:

Phrase-Based Statistical Machine Translation (PBSMT) became a leading
paradigm in Statistical Machine Translation after its introduction in
2003. From the start, one has tried to improve PBSMT by using
linguistic knowledge, often by incorporating syntactic information
into the model.
  This thesis proposes a simple approach to improve PBSMT using a
general linguistic notion, that of adjuncts, or modifiers: One expects
that in structurally similar languages like French and English,
adjuncts in one language are likely to be translated as adjuncts in
the other language. After verifying this assumption, this thesis
describes how adjunct pairs are deleted from a bilingual corpus to
generate new training data for a model, which is then used to smooth a
PBSMT baseline.
  Experiments on a smoothed French-English model show only a marginal
improvement over the baseline. It appears that few of the phrase pairs
gained by adjunct-pair deletion are actually used in testing, so that
improvement in performance mostly results from successful
smoothing. Further research directions would be to find out in how far
performance can be improved for this system, but also to apply
adjunct-pair deletion to other language pairs and to hierarchical SMT
models.