PP-2008-22: Bod, Rens (2008) Is the End of Supervised Parsing in Sight? [Report]
Preview |
Text (Full Text)
PP-2008-22.text.pdf Download (95kB) | Preview |
Text (Abstract)
PP-2008-22.abstract.txt Download (970B) |
Abstract
How far can we get with unsupervised parsing if we make our training
corpus several orders of magnitude larger than has hitherto be
attempted? We present a new algorithm for unsupervised parsing using
an all-subtrees model, termed U-DOP*, which parses directly with
packed forests of all binary trees. We train both on Penn's WSJ data
and on the (much larger) NANC corpus, showing that U-DOP* outperforms
a treebank-PCFG on the standard WSJ test set. While U-DOP* performs
worse than state-of-the-art supervised parsers on handannotated
sentences, we show that the model outperforms supervised parsers when
evaluated as a language model in syntax-based machine translation on
Europarl. We argue that supervised parsers miss the fluidity between
constituents and non-constituents and that in the field of
syntax-based language modeling the end of supervised parsing has come
in sight.
Item Type: | Report |
---|---|
Report Nr: | PP-2008-22 |
Series Name: | Prepublication (PP) Series |
Year: | 2008 |
Uncontrolled Keywords: | DOP; U-DOP |
Depositing User: | Rens Bod |
Date Deposited: | 12 Oct 2016 14:36 |
Last Modified: | 12 Oct 2016 14:36 |
URI: | https://eprints.illc.uva.nl/id/eprint/296 |
Actions (login required)
View Item |