PP-2007-40: Borensztajn, Gideon and Zuidema, Willem (2007) Bayesian Model Merging for Unsupervised Constituent Labeling and Grammar Induction. [Report]
Preview |
Text (Full Text)
PP-2007-40.text.pdf Download (200kB) | Preview |
Text (Abstract)
PP-2007-40.abstract.txt Download (804B) |
Abstract
Recent research on unsupervised grammar induction has focused on
inducing accurate bracketing of sentences. Here we present an
efficient, Bayesian algorithm for the unsupervised induction of
syntactic categories from such bracketed text. Our model gives
state-of-the-art results on this task, using gold-standard bracketing,
outperforming the recent semi-supervised approach of (Haghighi &
Klein, 2006), obtaining an F_1 of 76.8% (when appropriately
relabeled). Our algorithm assigns comparable likelihood to unseen
text as the treebank PCFG. Finally, we discuss the metrics used and
linguistic relevance of the results.
Item Type: | Report |
---|---|
Report Nr: | PP-2007-40 |
Series Name: | Prepublication (PP) Series |
Year: | 2007 |
Uncontrolled Keywords: | grammatical inference; language learning |
Subjects: | Language |
Depositing User: | Jelle Zuidema |
Date Deposited: | 12 Oct 2016 14:36 |
Last Modified: | 12 Oct 2016 14:36 |
URI: | https://eprints.illc.uva.nl/id/eprint/274 |
Actions (login required)
View Item |