PP-2008-22: Is the End of Supervised Parsing in Sight?

PP-2008-22: Bod, Rens (2008) Is the End of Supervised Parsing in Sight? [Report]

Preview	Text (Full Text) PP-2008-22.text.pdf Download (95kB) \| Preview
	Text (Abstract) PP-2008-22.abstract.txt Download (970B)

Abstract

How far can we get with unsupervised parsing if we make our training
corpus several orders of magnitude larger than has hitherto be
attempted? We present a new algorithm for unsupervised parsing using
an all-subtrees model, termed U-DOP*, which parses directly with
packed forests of all binary trees. We train both on Penn's WSJ data
and on the (much larger) NANC corpus, showing that U-DOP* outperforms
a treebank-PCFG on the standard WSJ test set. While U-DOP* performs
worse than state-of-the-art supervised parsers on handannotated
sentences, we show that the model outperforms supervised parsers when
evaluated as a language model in syntax-based machine translation on
Europarl. We argue that supervised parsers miss the fluidity between
constituents and non-constituents and that in the field of
syntax-based language modeling the end of supervised parsing has come
in sight.

Item Type:	Report
Report Nr:	PP-2008-22
Series Name:	Prepublication (PP) Series
Year:	2008
Uncontrolled Keywords:	DOP; U-DOP
Depositing User:	Rens Bod
Date Deposited:	12 Oct 2016 14:36
Last Modified:	12 Oct 2016 14:36
URI:	https://eprints.illc.uva.nl/id/eprint/296

Actions (login required)

View Item