DS-2012-01: Decomposing and Regenerating Syntactic Trees

DS-2012-01: Sangati, Federico (2012) Decomposing and Regenerating Syntactic Trees. Doctoral thesis, University of Amsterdam.

[thumbnail of Full Text] Text (Full Text)
DS-2012-01.text.pdf

Download (2MB)
[thumbnail of Samenvatting] Text (Samenvatting)
DS-2012-01.samenvatting.txt

Download (2kB)

Abstract

The thesis focuses on learning syntactic tree structures by
generalizing over annotated treebanks. It investigates several
probabilistic models for three different syntactic representations.

Standard phrase-structure and dependency-structure treebanks are used
to train and test the models. A third representation is proposed,
based on a systematic yet concise formulation of the original
dependency theory proposed by Lucien. This new representation
incorporates all main advantages of phrase-structure and
dependency-structure, and represents a valid compromise between
adequacy and simplicity in syntactic description.

One of the main contributions of the thesis is to formulate a general
framework for defining probabilistic generative models of syntax. In
every model syntactic trees are decomposed in elementary constructs
which can be recomposed to generate novel syntactic structures by
means of specific combinatory operations.

For learning phrase-structures, a novel Data-Oriented Parsing approach
is proposed. Following the original DOP framework, constructs of
variable size are utilized as building blocks of the model. In order
to restrict the grammar to a small yet representative set of
constructions, only those recurring multiple times in the training
treebank are utilized. For finding recurring fragments a novel
efficient tree-kernel algorithm is utilized.

Regarding the other two representations, several generative models are
formulated and evaluated by means of a re-ranking framework. This
represents an effective methodology, which can function as a
parser-simulator, and can guide the process of (re)defining
probabilistic generative models for learning syntactic structures.

Item Type: Thesis (Doctoral)
Report Nr: DS-2012-01
Series Name: ILLC Dissertation (DS) Series
Year: 2012
Depositing User: Dr Marco Vervoort
Date Deposited: 14 Jun 2022 15:16
Last Modified: 14 Jun 2022 15:16
URI: https://eprints.illc.uva.nl/id/eprint/2107

Actions (login required)

View Item View Item