MoL-2002-07: Back-off as Parameter Estimation for DOP models

MoL-2002-07: Buratto, Luciano (2002) Back-off as Parameter Estimation for DOP models. [Report]

[thumbnail of Full Text] Text (Full Text)
MoL-2002-07.text.ps.gz

Download (253kB)
[thumbnail of Abstract] Text (Abstract)
MoL-2002-07.abstract.txt

Download (1kB)

Abstract

Data Oriented Parsing (DOP) is a probabilistic performance approach to
parsing natural language. Several DOP models have been proposed since
it was introduced by Scha (1990), achieving promising results. One
important feature of these models is the probability estimation
procedure. Two major estimators have been put forward: Bod (1993) uses
a relative frequency estimator; Bonnema (1999) adds a rescaling factor
to correct for tree size effects. Both estimators, however, present
biases. Moreover, Bod's estimator has been shown to be inconsistent
(Johnson, 2002), meaning that the probability estimates hypothesized
by the model do not approach the true probabilities that generated the
data as the sample size grows. In this thesis, we implement a new
estimation procedure that tackles the shortcomings of the two previous
methods. The main idea is to treat derivation events not as disjoint,
but as interrelated in a hierarchical cascade of parse tree
derivations. We show that this new estimator - called the Back-Off DOP
(BO-DOP) estimator - outperforms both previous models. We tested it on
the OVIS treebank, a Dutch language, speech-based system, and report
error reductions of up to 11.4% and 15% when compared to,
respectively, Bod's and Bonnema's estimators.

Item Type: Report
Report Nr: MoL-2002-07
Series Name: Master of Logic Thesis (MoL) Series
Year: 2002
Date Deposited: 12 Oct 2016 14:38
Last Modified: 12 Oct 2016 14:38
URI: https://eprints.illc.uva.nl/id/eprint/735

Actions (login required)

View Item View Item