MoL-2004-06: Rank Consistent Estimation: The DOP Case

MoL-2004-06: Nguyen, Thuy Linh (2004) Rank Consistent Estimation: The DOP Case. [Report]

Preview	Text (Full Text (PDF)) MoL-2004-06.text.pdf Download (431kB) \| Preview
	Text (Full Text (PS)) MoL-2004-06.text.ps.gz Download (318kB)
	Text (Abstract) MoL-2004-06.abstract.txt Download (1kB)

Abstract

The goal of an estimator is to approximate the unknown distribution of
the language from its partial evidence. In this thesis, a rank
consistent estimator is defined as an estimator that preserves the
ranking frequencies of all the full parse trees in the treebank proved
to be rank consistent with respect to the training treebank. The rank
consistency property adopts Laplace's Principle of Insufficient Reason
for statistical parsing: a rank consistent estimator assigns the same
probability to all trees that occur the same number of times in the
training data.
This thesis presents the first non&trivial DOP estimator where the
treebank is not only considered as a stochastic generating system but
also a sample of the stochastic process. In this thesis, the existing
DOP definitions of probability and derivation of full parse trees are
generalized to subtrees. Fragments in the treebank's fragment corpus
are assigned weights so that their probabilities are proportional to
their relative frequencies. The estimator is proved to be rank
consistent.
The theoretical property of the model is substantiated by empirical
results. The new estimator outperforms the DOP1 estimator on the OVIS
corpus.

Item Type:	Report
Report Nr:	MoL-2004-06
Series Name:	Master of Logic Thesis (MoL) Series
Year:	2004
Uncontrolled Keywords:	statistical parsing, parse trees, language distribution
Subjects:	Language
Date Deposited:	12 Oct 2016 14:38
Last Modified:	12 Oct 2016 14:38
URI:	https://eprints.illc.uva.nl/id/eprint/751

Actions (login required)

View Item