PP-1999-26: A New Probability Model for Data Oriented Parsing (Extended Version)

PP-1999-26: Bonnema, Remko and Buying, Paul and Scha, Remko (1999) A New Probability Model for Data Oriented Parsing (Extended Version). [Report]

[thumbnail of Abstract] Text (Abstract)
PP-1999-26.abstract.txt

Download (1kB)

Abstract

Data oriented parsing systems employ redundant stochastic tree
substitution grammars STSGs to analyse natural language utterances on
the basis of an annotated corpus (a treebank). An important component
of such systems is the way in which the substitution probability of a
parse tree is estimated from its occurrences in the treebank. In the
standard method for doing this, the probability of a parse tree is
directly correlated with its occurrence frequency in the collection of
all fragments of all corpus trees. We show that this results in
undesirable statistical biases. We therefore propose an alternative
method, which estimates the substitution probability of a fragment as
the probability that it has been involved in the derivation of a
corpus tree. We show that this method has more plausible properties.

Keyword(s): linguistics, parsing, probabilistic parsing, data oriented
parsing, dop, statistical methods in linguistics, corpus linguistics

Item Type: Report
Report Nr: PP-1999-26
Series Name: Prepublication (PP) Series
Year: 1999
Date Deposited: 12 Oct 2016 14:36
Last Modified: 12 Oct 2016 14:36
URI: https://eprints.illc.uva.nl/id/eprint/26

Actions (login required)

View Item View Item