MoL-2004-06: Rank Consistent Estimation: The DOP Case

MoL-2004-06: Nguyen, Thuy Linh (2004) Rank Consistent Estimation: The DOP Case. [Report]

Text (Full Text (PDF))

Download (431kB) | Preview
[img] Text (Full Text (PS))

Download (318kB)
[img] Text (Abstract)

Download (1kB)


The goal of an estimator is to approximate the unknown distribution of the language from its partial evidence. In this thesis, a rank consistent estimator is defined as an estimator that preserves the ranking frequencies of all the full parse trees in the treebank proved to be rank consistent with respect to the training treebank. The rank consistency property adopts Laplace's Principle of Insufficient Reason for statistical parsing: a rank consistent estimator assigns the same probability to all trees that occur the same number of times in the training data. This thesis presents the first non&trivial DOP estimator where the treebank is not only considered as a stochastic generating system but also a sample of the stochastic process. In this thesis, the existing DOP definitions of probability and derivation of full parse trees are generalized to subtrees. Fragments in the treebank's fragment corpus are assigned weights so that their probabilities are proportional to their relative frequencies. The estimator is proved to be rank consistent. The theoretical property of the model is substantiated by empirical results. The new estimator outperforms the DOP1 estimator on the OVIS corpus.

Item Type: Report
Report Nr: MoL-2004-06
Series Name: Master of Logic Thesis (MoL) Series
Year: 2004
Uncontrolled Keywords: statistical parsing, parse trees, language distribution
Subjects: Language
Date Deposited: 12 Oct 2016 14:38
Last Modified: 12 Oct 2016 14:38

Actions (login required)

View Item View Item