PP-2008-24: The Data-Oriented Parsing Approach: Theory and Application

PP-2008-24: Bod, Rens (2008) The Data-Oriented Parsing Approach: Theory and Application. [Report]

Text (Full Text)

Download (384kB) | Preview
[img] Text (Abstract)

Download (1kB)


A corpus-based parsing approach that has been quite successful in various fields of AI, is known as Data-Oriented Parsing or DOP. DOP was originally developed as an NLP technique but has been generalized to music analysis, problem-solving and unsupervised structure learning. The distinctive feature of the DOP approach, when it was first presented, was to model sentence structures on the basis of previously observed frequencies of sentence-structure fragments, without imposing any constraints on the size of these fragments. Fragments include, for instance, subtrees of depth 1 (corresponding to context-free rules), as well as entire trees. The DOP approach has been generalized to other modalities, including music analysis and problem solving. It has turned out that probabilistic corpus-based parsing outperforms deterministic rule-based processing not only for language but also for melodic analysis and problem solving. Our goal for this Chapter is therefore to present the DOP approach from a multi-modal perspective. But in order to do, it is convenient to first explain DOP for language processing, after which we discuss an integrated DOP model that unifies the different modalities. We will go into the various computational issues and show how the model can be tested against hand-annotated corpora. Finally, we will discuss shortcomings of this supervised approach, and present some results of recent work that extends DOP towards unsupervised learning.

Item Type: Report
Report Nr: PP-2008-24
Series Name: Prepublication (PP) Series
Year: 2008
Uncontrolled Keywords: unifiying model DOP
Depositing User: Rens Bod
Date Deposited: 12 Oct 2016 14:36
Last Modified: 12 Oct 2016 14:36
URI: https://eprints.illc.uva.nl/id/eprint/298

Actions (login required)

View Item View Item