PP-2008-22: Is the End of Supervised Parsing in Sight?

PP-2008-22: Bod, Rens (2008) Is the End of Supervised Parsing in Sight? [Report]

[thumbnail of Full Text]
Preview
Text (Full Text)
PP-2008-22.text.pdf

Download (95kB) | Preview
[thumbnail of Abstract] Text (Abstract)
PP-2008-22.abstract.txt

Download (970B)

Abstract

How far can we get with unsupervised parsing if we make our training
corpus several orders of magnitude larger than has hitherto be
attempted? We present a new algorithm for unsupervised parsing using
an all-subtrees model, termed U-DOP*, which parses directly with
packed forests of all binary trees. We train both on Penn's WSJ data
and on the (much larger) NANC corpus, showing that U-DOP* outperforms
a treebank-PCFG on the standard WSJ test set. While U-DOP* performs
worse than state-of-the-art supervised parsers on handannotated
sentences, we show that the model outperforms supervised parsers when
evaluated as a language model in syntax-based machine translation on
Europarl. We argue that supervised parsers miss the fluidity between
constituents and non-constituents and that in the field of
syntax-based language modeling the end of supervised parsing has come
in sight.

Item Type: Report
Report Nr: PP-2008-22
Series Name: Prepublication (PP) Series
Year: 2008
Uncontrolled Keywords: DOP; U-DOP
Depositing User: Rens Bod
Date Deposited: 12 Oct 2016 14:36
Last Modified: 12 Oct 2016 14:36
URI: https://eprints.illc.uva.nl/id/eprint/296

Actions (login required)

View Item View Item