PP-2007-40: Bayesian Model Merging for Unsupervised Constituent Labeling and Grammar Induction

PP-2007-40: Borensztajn, Gideon and Zuidema, Willem (2007) Bayesian Model Merging for Unsupervised Constituent Labeling and Grammar Induction. [Report]

[img]
Preview
Text (Full Text)
PP-2007-40.text.pdf

Download (200kB) | Preview
[img] Text (Abstract)
PP-2007-40.abstract.txt

Download (804B)

Abstract

Recent research on unsupervised grammar induction has focused on inducing accurate bracketing of sentences. Here we present an efficient, Bayesian algorithm for the unsupervised induction of syntactic categories from such bracketed text. Our model gives state-of-the-art results on this task, using gold-standard bracketing, outperforming the recent semi-supervised approach of (Haghighi & Klein, 2006), obtaining an F_1 of 76.8% (when appropriately relabeled). Our algorithm assigns comparable likelihood to unseen text as the treebank PCFG. Finally, we discuss the metrics used and linguistic relevance of the results.

Item Type: Report
Report Nr: PP-2007-40
Series Name: Prepublication (PP) Series
Year: 2007
Uncontrolled Keywords: grammatical inference; language learning
Subjects: Language
Depositing User: Jelle Zuidema
Date Deposited: 12 Oct 2016 14:36
Last Modified: 12 Oct 2016 14:36
URI: https://eprints.illc.uva.nl/id/eprint/274

Actions (login required)

View Item View Item