PP-2007-40: Bayesian Model Merging for Unsupervised Constituent Labeling and Grammar Induction

PP-2007-40: Borensztajn, Gideon and Zuidema, Willem (2007) Bayesian Model Merging for Unsupervised Constituent Labeling and Grammar Induction. [Report]

[thumbnail of Full Text]
Preview
Text (Full Text)
PP-2007-40.text.pdf

Download (200kB) | Preview
[thumbnail of Abstract] Text (Abstract)
PP-2007-40.abstract.txt

Download (804B)

Abstract

Recent research on unsupervised grammar induction has focused on
inducing accurate bracketing of sentences. Here we present an
efficient, Bayesian algorithm for the unsupervised induction of
syntactic categories from such bracketed text. Our model gives
state-of-the-art results on this task, using gold-standard bracketing,
outperforming the recent semi-supervised approach of (Haghighi &
Klein, 2006), obtaining an F_1 of 76.8% (when appropriately
relabeled). Our algorithm assigns comparable likelihood to unseen
text as the treebank PCFG. Finally, we discuss the metrics used and
linguistic relevance of the results.

Item Type: Report
Report Nr: PP-2007-40
Series Name: Prepublication (PP) Series
Year: 2007
Uncontrolled Keywords: grammatical inference; language learning
Subjects: Language
Depositing User: Jelle Zuidema
Date Deposited: 12 Oct 2016 14:36
Last Modified: 12 Oct 2016 14:36
URI: https://eprints.illc.uva.nl/id/eprint/274

Actions (login required)

View Item View Item