PP-2008-22: Is the End of Supervised Parsing in Sight?

PP-2008-22: Bod, Rens (2008) Is the End of Supervised Parsing in Sight? [Report]

[img]
Preview
Text (Full Text)
PP-2008-22.text.pdf

Download (95kB) | Preview
[img] Text (Abstract)
PP-2008-22.abstract.txt

Download (970B)

Abstract

How far can we get with unsupervised parsing if we make our training corpus several orders of magnitude larger than has hitherto be attempted? We present a new algorithm for unsupervised parsing using an all-subtrees model, termed U-DOP*, which parses directly with packed forests of all binary trees. We train both on Penn's WSJ data and on the (much larger) NANC corpus, showing that U-DOP* outperforms a treebank-PCFG on the standard WSJ test set. While U-DOP* performs worse than state-of-the-art supervised parsers on handannotated sentences, we show that the model outperforms supervised parsers when evaluated as a language model in syntax-based machine translation on Europarl. We argue that supervised parsers miss the fluidity between constituents and non-constituents and that in the field of syntax-based language modeling the end of supervised parsing has come in sight.

Item Type: Report
Report Nr: PP-2008-22
Series Name: Prepublication (PP) Series
Year: 2008
Uncontrolled Keywords: DOP; U-DOP
Depositing User: Rens Bod
Date Deposited: 12 Oct 2016 14:36
Last Modified: 12 Oct 2016 14:36
URI: https://eprints.illc.uva.nl/id/eprint/296

Actions (login required)

View Item View Item