MoL-2012-22: Refining translation grammars through paraphrase clustering

MoL-2012-22: Garmash, Ekaterina (2012) Refining translation grammars through paraphrase clustering. [Report]

[thumbnail of Full Text]
Text (Full Text)

Download (570kB) | Preview
[thumbnail of Abstract] Text (Abstract)

Download (1kB)


Finding the right model for the structure of translation equivalence
between languages is one of the major challenges and lines of research
in statistical machine translation. In this thesis we consider a
formalization of translation equivalence as synchronous grammars and
explore a particular way of modifying a translation grammar - by
labeling its nonterminal symbols. The labeling we develop is based on
the general notion of semantic equivalence: since it is not known *a
priori* what kind of semantic distinctions are relevant to translation
equivalence, we define an unsupervised procedure to learn a label set
by clustering close paraphrases that somehow characterize strings
generated from a given nonterminal symbol. We implement the defined
procedure and test a current baseline grammar (Hiero system) labeled
with a generated label set. By trying out a number of labeling
algorithms and introducing additional modifications to the grammar, as
well as making other changes to the standard translation pipeline, we
find that the performance of the labeled grammar is worse than the one
of the unlabeled. We discuss possible reasons for that and propose a
number of modifications to the labeling procedure we defined and
implemented here that could improve the performance.

Item Type: Report
Report Nr: MoL-2012-22
Series Name: Master of Logic Thesis (MoL) Series
Year: 2012
Uncontrolled Keywords: Logic; Language
Date Deposited: 12 Oct 2016 14:38
Last Modified: 12 Oct 2016 14:38

Actions (login required)

View Item View Item